Server Admin Log/Archive 82

2024-06-30

23:25 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
23:25 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
23:17 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
23:15 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
23:14 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
23:14 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
23:14 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
23:13 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
23:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
23:11 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
23:11 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
23:11 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
23:11 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
23:09 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
23:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
23:05 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
23:03 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:56 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:53 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:53 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:51 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:08 _joe_: delete failing pod in eqiad for mw-api-ext, caused the backend errors page

2024-06-29

01:24 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1040.eqiad.wmnet with OS bullseye
01:12 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:10 jclark@cumin1002: START - Cookbook sre.dns.netbox
00:04 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
00:04 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
00:03 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
00:00 jclark@cumin1002: START - Cookbook sre.dns.netbox

2024-06-28

23:57 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:56 jclark@cumin1002: START - Cookbook sre.dns.netbox
23:55 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:55 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
23:54 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
23:50 jclark@cumin1002: START - Cookbook sre.dns.netbox
23:42 tzatziki: removing 1 image for legal compliance
23:33 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephosd1039.eqiad.wmnet
23:32 pt1979@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudcephosd1039.eqiad.wmnet
23:32 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:32 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip address for cloudcephosd1039 - pt1979@cumin2002"
23:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip address for cloudcephosd1039 - pt1979@cumin2002"
23:29 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephosd1039.eqiad.wmnet
23:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox
23:23 pt1979@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudcephosd1039.eqiad.wmnet
23:22 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephosd1039.eqiad.wmnet
23:21 tzatziki: removing 1 image for legal compliance
23:18 pt1979@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudcephosd1039.eqiad.wmnet
23:18 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephosd1039.eqiad.wmnet
23:16 tzatziki: removing 1 image for legal compliance
22:50 pt1979@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudcephosd1039.eqiad.wmnet
22:14 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1041']
22:13 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1041']
22:10 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1040']
22:09 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1040']
21:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
21:42 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:42 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1040 - jclark@cumin1002"
21:41 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1041.eqiad.wmnet with OS bullseye
21:41 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
21:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1040 - jclark@cumin1002"
21:38 jclark@cumin1002: START - Cookbook sre.dns.netbox
21:38 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
21:36 jclark@cumin1002: START - Cookbook sre.dns.netbox
21:35 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
21:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1041.mgmt.eqiad.wmnet with reboot policy FORCED
21:34 jclark@cumin1002: START - Cookbook sre.dns.netbox
21:33 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1039.mgmt.eqiad.wmnet with reboot policy FORCED
21:30 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1040.mgmt.eqiad.wmnet with reboot policy FORCED
21:22 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1041.mgmt.eqiad.wmnet with reboot policy FORCED
21:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1039 - jclark@cumin1002"
21:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1039 - jclark@cumin1002"
21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox
21:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1040.mgmt.eqiad.wmnet with reboot policy FORCED
21:17 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:17 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1039 - jclark@cumin1002"
21:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1039.mgmt.eqiad.wmnet with reboot policy FORCED
21:16 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1039 - jclark@cumin1002"
21:11 jclark@cumin1002: START - Cookbook sre.dns.netbox
21:05 sukhe: sudo cumin -b11 "A:cp-text" 'run-puppet-agent'
20:29 sukhe: sudo cumin "A:cp-text" 'disable-puppet "CR 1050672"'
20:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1028.eqiad.wmnet with OS bookworm
20:20 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
20:19 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
20:18 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1029.eqiad.wmnet with OS bookworm
20:18 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
20:15 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
20:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1028.eqiad.wmnet with reason: host reimage
20:01 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on lists1001.wikimedia.org with reason: decomed
20:01 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on lists1001.wikimedia.org with reason: decomed
20:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1029.eqiad.wmnet with reason: host reimage
19:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1028.eqiad.wmnet with reason: host reimage
19:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1029.eqiad.wmnet with reason: host reimage
19:46 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dbproxy1029.eqiad.wmnet with OS bookworm
19:46 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dbproxy1028.eqiad.wmnet with OS bookworm
19:37 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:37 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
19:36 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1029
19:36 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
19:35 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1029
19:34 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1028
19:33 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1028
19:31 jclark@cumin1002: START - Cookbook sre.dns.netbox
19:31 jclark@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
19:31 sukhe: sudo cumin -b10 "A:cp-text" "run-puppet-agent --enable 'dont enable'": T368645
19:30 jclark@cumin1002: START - Cookbook sre.dns.netbox
18:22 sukhe: disable puppet on A:cp-text
18:16 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
18:16 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
18:05 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
18:05 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
16:43 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2026.codfw.wmnet with OS bullseye
16:36 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:36 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:33 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:23 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:19 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
16:03 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on mw2300.codfw.wmnet with reason: Reimaging issues
16:03 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on mw2300.codfw.wmnet with reason: Reimaging issues
15:45 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
15:43 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:43 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for wikikube-worker2026 - cmooney@cumin1002"
15:35 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for wikikube-worker2026 - cmooney@cumin1002"
15:32 cmooney@cumin1002: START - Cookbook sre.dns.netbox
15:25 hnowlan@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2025.codfw.wmnet|wikikube-worker2027.codfw.wmnet|wikikube-worker2028.codfw.wmnet|wikikube-worker2029.codfw.wmnet),cluster=kubernetes,service=kubesvc
15:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:21 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
15:21 andrewbogott: upgraded wikitech-static to 1_42 and php 8.3
15:14 hnowlan: homer 'cr*codfw*' commit 'T351074'
15:14 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2027.codfw.wmnet with OS bullseye
15:12 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:11 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
15:11 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:10 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2028.codfw.wmnet with OS bullseye
15:10 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1027.eqiad.wmnet|wikikube-worker1028.eqiad.wmnet|wikikube-worker1029.eqiad.wmnet|wikikube-worker1030.eqiad.wmnet|wikikube-worker1031.eqiad.wmnet),cluster=kubernetes,service=kubesvc
15:10 claime: Pooling and uncordoning wikikube-worker1027.eqiad.wmnet,wikikube-worker1028.eqiad.wmnet,wikikube-worker1029.eqiad.wmnet,wikikube-worker1030.eqiad.wmnet,wikikube-worker1031.eqiad.wmnet - T351074
15:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2029.codfw.wmnet with OS bullseye
15:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2025.codfw.wmnet with OS bullseye
15:06 jhathaway: mx-in1001 postfix mx testing complete
15:04 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
15:00 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:00 claime: homer 'cr*eqiad*' commit 'T351074'
14:56 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2027.codfw.wmnet with reason: host reimage
14:54 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
14:52 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2028.codfw.wmnet with reason: host reimage
14:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2029.codfw.wmnet with reason: host reimage
14:47 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2025.codfw.wmnet with reason: host reimage
14:46 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2029.codfw.wmnet with reason: host reimage
14:46 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2028.codfw.wmnet with reason: host reimage
14:45 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2027.codfw.wmnet with reason: host reimage
14:45 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2025.codfw.wmnet with reason: host reimage
14:30 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2029.codfw.wmnet with OS bullseye
14:30 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2028.codfw.wmnet with OS bullseye
14:29 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2027.codfw.wmnet with OS bullseye
14:28 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2025.codfw.wmnet with OS bullseye
14:27 sukhe: sudo cumin "O:durum" "run-puppet-agent"
14:27 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2330 to wikikube-worker2029
14:27 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2029
14:26 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2029
14:26 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:26 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2330 to wikikube-worker2029 - hnowlan@cumin1002"
14:25 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
14:25 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2330 to wikikube-worker2029 - hnowlan@cumin1002"
14:24 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:23 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
14:23 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:22 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
14:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host krb1002.eqiad.wmnet with OS bookworm
14:22 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:22 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2330 to wikikube-worker2029
14:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2308 to wikikube-worker2028
14:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2028
14:22 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2028
14:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:21 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2308 to wikikube-worker2028 - hnowlan@cumin1002"
14:15 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:12 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2308 to wikikube-worker2028 - hnowlan@cumin1002"
14:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host deploy1003.eqiad.wmnet with OS bullseye
14:08 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
14:07 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2308 to wikikube-worker2028
14:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2306 to wikikube-worker2027
14:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2027
14:07 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:07 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2027
14:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:06 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2306 to wikikube-worker2027 - hnowlan@cumin1002"
14:06 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx-in1001.wikimedia.org with reason: email testing
14:05 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on mx-in1001.wikimedia.org with reason: email testing
14:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
14:04 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:03 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2306 to wikikube-worker2027 - hnowlan@cumin1002"
14:01 jhathaway: ingressing email on mx-in1001, initial test 1hr
14:00 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
14:00 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2306 to wikikube-worker2027
13:59 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2298 to wikikube-worker2025
13:59 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2025
13:56 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
13:54 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2025
13:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2298 to wikikube-worker2025 - hnowlan@cumin1002"
13:53 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
13:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb1002.eqiad.wmnet with reason: host reimage
13:49 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2298 to wikikube-worker2025 - hnowlan@cumin1002"
13:46 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from mw2300 to wikikube-worker2026
13:46 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2300 to wikikube-worker2026
13:46 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
13:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1027.eqiad.wmnet with reason: host reimage
13:45 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2298 to wikikube-worker2025
13:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on krb1002.eqiad.wmnet with reason: host reimage
13:43 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1027.eqiad.wmnet with reason: host reimage
13:42 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
13:42 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
13:42 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
13:42 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bullseye
13:42 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
13:42 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
13:41 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
13:41 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host deploy1003.eqiad.wmnet with OS bookworm
13:41 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1002"
13:38 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
13:38 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
13:38 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
13:37 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host krb1002.eqiad.wmnet with OS bookworm
13:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
13:29 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1002"
13:28 hnowlan: running `decommission` on 5 codfw api appservers
13:27 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1028.mgmt.eqiad.wmnet on all recursors
13:27 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1028.mgmt.eqiad.wmnet on all recursors
13:27 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1027.mgmt.eqiad.wmnet on all recursors
13:27 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1027.mgmt.eqiad.wmnet on all recursors
13:26 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
13:25 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:25 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for wikikube-worker102[7-8] - cmooney@cumin1002"
13:24 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for wikikube-worker102[7-8] - cmooney@cumin1002"
13:21 cmooney@cumin1002: START - Cookbook sre.dns.netbox
13:18 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1028.mgmt.eqiad.wmnet on all recursors
13:18 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1028.mgmt.eqiad.wmnet on all recursors
13:18 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1027.mgmt.eqiad.wmnet on all recursors
13:18 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1027.mgmt.eqiad.wmnet on all recursors
13:15 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
13:12 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
13:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1029.eqiad.wmnet with OS bullseye
13:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1028.eqiad.wmnet with reason: mgmt ip issue
13:05 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1028.eqiad.wmnet with reason: mgmt ip issue
13:01 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bookworm
12:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367856)', diff saved to https://phabricator.wikimedia.org/P65547 and previous config saved to /var/cache/conftool/dbconfig/20240628-125926-marostegui.json
12:55 hashar@deploy1002: Finished deploy [gerrit/gerrit@0db053e]: Upgrade Gerrit 3.10.0-32-gf77960412e to 3.10.0-71-gf6e9431fff - T367029 T341291 (duration: 00m 09s)
12:55 hashar@deploy1002: Started deploy [gerrit/gerrit@0db053e]: Upgrade Gerrit 3.10.0-32-gf77960412e to 3.10.0-71-gf6e9431fff - T367029 T341291
12:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1029.eqiad.wmnet with reason: host reimage
12:50 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1029.eqiad.wmnet with reason: host reimage
12:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
12:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
12:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
12:44 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
12:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65546 and previous config saved to /var/cache/conftool/dbconfig/20240628-124419-marostegui.json
12:44 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-conf1004
12:44 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-conf1004
12:21 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-conf1004
12:18 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:18 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for an-conf1005,6 - jclark@cumin1002"
12:17 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for an-conf1005,6 - jclark@cumin1002"
12:15 jclark@cumin1002: START - Cookbook sre.dns.netbox
12:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367856)', diff saved to https://phabricator.wikimedia.org/P65544 and previous config saved to /var/cache/conftool/dbconfig/20240628-121404-marostegui.json
12:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1030.eqiad.wmnet with OS bullseye
12:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1028.eqiad.wmnet with OS bullseye
12:05 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
12:05 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
11:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1030.eqiad.wmnet with reason: host reimage
11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1031.eqiad.wmnet with OS bullseye
11:51 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1030.eqiad.wmnet with reason: host reimage
11:50 Dreamy_Jazz: Finished run on `medium.dblist`
11:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1028.eqiad.wmnet with reason: host reimage
11:45 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
11:45 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
11:45 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
11:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1028.eqiad.wmnet with reason: host reimage
11:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1029.eqiad.wmnet with OS bullseye
11:44 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
11:44 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1029.eqiad.wmnet with OS bullseye
11:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1030.eqiad.wmnet with OS bullseye
11:38 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1030.eqiad.wmnet with OS bullseye
11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1031.eqiad.wmnet with reason: host reimage
11:31 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1031.eqiad.wmnet with reason: host reimage
11:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
11:30 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
11:29 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided) (duration: 00m 44s)
11:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1029.eqiad.wmnet with OS bullseye
11:29 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1029.eqiad.wmnet with OS bullseye
11:29 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided)
11:26 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1028.eqiad.wmnet with OS bullseye
11:23 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1028.eqiad.wmnet with OS bullseye
11:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1031.eqiad.wmnet with OS bullseye
11:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1031.eqiad.wmnet on all recursors
11:18 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1031.eqiad.wmnet on all recursors
11:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1450 to wikikube-worker1031
11:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1031
11:16 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1031
11:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1450 to wikikube-worker1031 - cgoubert@cumin1002"
11:16 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
11:15 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1028.eqiad.wmnet with OS bullseye
11:15 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1028.eqiad.wmnet with OS bullseye
11:14 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1450 to wikikube-worker1031 - cgoubert@cumin1002"
11:13 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided) (duration: 00m 25s)
11:13 Dreamy_Jazz: Running `foreachwikiindblist medium.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200` for T366781. `medium.dblist` does not include `loginwiki` or `metawiki` (which are to be done later).
11:12 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided)
11:11 Dreamy_Jazz: `foreachwikiindblist group1-wikipedia.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200` finished running
11:11 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1030.eqiad.wmnet with OS bullseye
11:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1030.eqiad.wmnet on all recursors
11:11 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1030.eqiad.wmnet on all recursors
11:09 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
11:09 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1450 to wikikube-worker1031
11:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1418 to wikikube-worker1030
11:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1030
11:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
11:07 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
11:06 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
11:04 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1030
11:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1418 to wikikube-worker1030 - cgoubert@cumin1002"
11:02 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1418 to wikikube-worker1030 - cgoubert@cumin1002"
11:01 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
11:00 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
11:00 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1418 to wikikube-worker1030
10:59 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1029.eqiad.wmnet with OS bullseye
10:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1029.eqiad.wmnet on all recursors
10:59 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1029.eqiad.wmnet on all recursors
10:58 Dreamy_Jazz: Running `foreachwikiindblist group1-wikipedia.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200`
10:58 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
10:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1417 to wikikube-worker1029
10:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1029
10:56 Dreamy_Jazz: Stopped running script at `cawiki`
10:56 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1029
10:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1417 to wikikube-worker1029 - cgoubert@cumin1002"
10:54 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1417 to wikikube-worker1029 - cgoubert@cumin1002"
10:51 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:51 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1417 to wikikube-worker1029
10:51 Dreamy_Jazz: Running `foreachwikiindblist group1.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200`
10:51 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1028.eqiad.wmnet with OS bullseye
10:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1028.eqiad.wmnet on all recursors
10:51 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1028.eqiad.wmnet on all recursors
10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1413 to wikikube-worker1028
10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1028
10:49 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1028
10:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1413 to wikikube-worker1028 - cgoubert@cumin1002"
10:48 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1413 to wikikube-worker1028 - cgoubert@cumin1002"
10:46 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:46 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1413 to wikikube-worker1028
10:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
10:45 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1027.eqiad.wmnet on all recursors
10:45 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1027.eqiad.wmnet on all recursors
10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1412 to wikikube-worker1027
10:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1027
10:44 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:44 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:44 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1027
10:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1412 to wikikube-worker1027 - cgoubert@cumin1002"
10:42 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1412 to wikikube-worker1027 - cgoubert@cumin1002"
10:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1412 to wikikube-worker1027
10:35 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:30 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
10:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:22 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
10:22 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
10:18 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
10:17 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
10:16 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
10:12 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:57 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-serve2007.codfw.wmnet
09:57 klausman@cumin2002: START - Cookbook sre.hosts.remove-downtime for ml-serve2007.codfw.wmnet
09:37 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:37 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:33 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
08:34 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
08:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T367856)', diff saved to https://phabricator.wikimedia.org/P65543 and previous config saved to /var/cache/conftool/dbconfig/20240628-082946-marostegui.json
08:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
08:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
07:54 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet,service=s4
02:30 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
02:30 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
02:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
02:23 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
02:23 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
02:16 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
02:14 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
02:04 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
02:00 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
02:00 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
02:00 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
02:00 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
01:44 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
01:44 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
01:39 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
01:39 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
01:37 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
01:37 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
00:52 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
00:51 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
00:45 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5024.eqsin.wmnet
00:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5024.eqsin.wmnet with OS bullseye
00:36 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
00:36 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
00:33 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
00:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
00:33 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
00:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
00:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
00:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
00:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5024.eqsin.wmnet with reason: host reimage
00:06 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
00:06 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
00:06 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5024.eqsin.wmnet with reason: host reimage

2024-06-27

23:55 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
23:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
23:33 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5024.eqsin.wmnet with OS bullseye
23:33 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5024.eqsin.wmnet with OS bullseye
23:32 eileen: civicrm upgraded from 76c6fed8 to f9782670
23:24 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
23:24 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
23:19 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
23:18 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
23:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
23:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
23:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367856)', diff saved to https://phabricator.wikimedia.org/P65542 and previous config saved to /var/cache/conftool/dbconfig/20240627-231703-marostegui.json
23:05 Dreamy_Jazz: Running `foreachwikiindblist group0.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php` for T366781
23:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65541 and previous config saved to /var/cache/conftool/dbconfig/20240627-230156-marostegui.json
22:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65540 and previous config saved to /var/cache/conftool/dbconfig/20240627-224649-marostegui.json
22:44 eileen: civicrm upgraded from 7747a290 to 76c6fed8
22:43 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5024.eqsin.wmnet with OS bullseye
22:43 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:42 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:41 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:41 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:39 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:37 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:37 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:34 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:34 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:33 eileen: civicrm upgraded from 3af41401 to 7747a290
22:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367856)', diff saved to https://phabricator.wikimedia.org/P65539 and previous config saved to /var/cache/conftool/dbconfig/20240627-223142-marostegui.json
22:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:21 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:19 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5024.eqsin.wmnet
22:13 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:13 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:09 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:05 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet
22:05 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5023.eqsin.wmnet with OS bullseye
21:58 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:58 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:54 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:54 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:53 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:53 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:50 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:50 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:50 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:50 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:32 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:31 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5023.eqsin.wmnet with reason: host reimage
21:25 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5023.eqsin.wmnet with reason: host reimage
21:22 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:22 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
20:53 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-conf1005
20:51 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-conf1005
20:50 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:50 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-conf1005 - vriley@cumin1002"
20:50 jhuneidi@deploy1002: Finished scap: Backport for testwiki: Enable QuickSurveys (T368459) (duration: 14m 33s)
20:49 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-conf1005 - vriley@cumin1002"
20:47 vriley@cumin1002: START - Cookbook sre.dns.netbox
20:44 jhuneidi@deploy1002: kharlan, jhuneidi: Continuing with sync
20:40 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:40 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
20:39 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5023.eqsin.wmnet with OS bullseye
20:38 jhuneidi@deploy1002: kharlan, jhuneidi: Backport for testwiki: Enable QuickSurveys (T368459) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:37 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:37 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
20:35 jhuneidi@deploy1002: Started scap: Backport for testwiki: Enable QuickSurveys (T368459)
20:34 jhuneidi@deploy1002: Finished scap: Backport for QuickSurveys: Add testing survey configuration (T368459) (duration: 14m 45s)
20:29 jhuneidi@deploy1002: kharlan, jhuneidi: Continuing with sync
20:24 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:24 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
20:22 jhuneidi@deploy1002: kharlan, jhuneidi: Backport for QuickSurveys: Add testing survey configuration (T368459) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
20:20 jhuneidi@deploy1002: Started scap: Backport for QuickSurveys: Add testing survey configuration (T368459)
20:17 jhuneidi@deploy1002: Finished scap: Backport for Enable DiscussionTools permalinks on enwiki (T365974) (duration: 11m 09s)
20:16 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5023.eqsin.wmnet
20:11 jhuneidi@deploy1002: jhuneidi, kemayo: Continuing with sync
20:08 jhuneidi@deploy1002: jhuneidi, kemayo: Backport for Enable DiscussionTools permalinks on enwiki (T365974) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:06 jhuneidi@deploy1002: Started scap: Backport for Enable DiscussionTools permalinks on enwiki (T365974)
20:03 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:03 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
19:55 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5022.eqsin.wmnet
19:53 ottomata: deleted mw-page-content-change-enrich stuck jobmanager pod: kubectl -n mw-page-content-change-enrich delete pod flink-app-main-859d98c57b-zrgwk - T368667
19:51 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1059.eqiad.wmnet with OS bookworm
19:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5022.eqsin.wmnet with OS bullseye
19:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1064.eqiad.wmnet with OS bookworm
19:27 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
19:23 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
19:14 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5022.eqsin.wmnet with reason: host reimage
19:10 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
19:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5022.eqsin.wmnet with reason: host reimage
19:07 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1059.eqiad.wmnet with OS bookworm
19:07 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
19:00 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:00 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:00 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
18:52 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bookworm
18:36 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5022.eqsin.wmnet with OS bullseye
18:36 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5022.eqsin.wmnet with OS bullseye
18:19 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
18:19 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
18:19 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5022.eqsin.wmnet with OS bullseye
18:19 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
18:18 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
18:15 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
18:14 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1067.eqiad.wmnet with OS bookworm
18:12 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
18:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
18:12 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.11 refs T366956
18:12 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1065.eqiad.wmnet with OS bookworm
18:11 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
18:11 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
18:10 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5022.eqsin.wmnet
18:08 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1066.eqiad.wmnet with OS bookworm
18:08 ejegg: fundraising civicrm upgraded from 13a13f3a to 43fc2c89
18:04 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1058.eqiad.wmnet with OS bookworm
17:59 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1057.eqiad.wmnet with OS bookworm
17:51 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet
17:50 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
17:47 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
17:45 swfrench@deploy1002: Finished scap: Deploying securityContext changes for T362978 to main release (duration: 04m 09s)
17:43 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
17:41 swfrench@deploy1002: Started scap: Deploying securityContext changes for T362978 to main release
17:39 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
17:37 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:37 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:36 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
17:35 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:35 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:34 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
17:34 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1003.eqiad.wmnet with OS bookworm
17:34 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
17:34 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
17:33 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
17:33 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:33 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
17:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:33 swfrench-wmf: canary deployments are healthy, slow-logs still produced, continuing with main deployments for T362978
17:33 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:32 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:32 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:26 hashar@deploy1002: Finished deploy [gerrit/gerrit@7659481]: Revert "Add image-diff JavaScript plugin (take 2)" (duration: 00m 07s)
17:26 hashar@deploy1002: Started deploy [gerrit/gerrit@7659481]: Revert "Add image-diff JavaScript plugin (take 2)"
17:26 hashar@deploy1002: deploy aborted: Revert Add image-diff JavaScript plugin (take 2) (duration: 00m 00s)
17:26 hashar@deploy1002: Started deploy [gerrit/gerrit@7659481]: Revert Add image-diff JavaScript plugin (take 2)
17:23 swfrench@deploy1002: Finished scap: (no justification provided) (duration: 08m 03s)
17:19 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bookworm
17:19 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1066.eqiad.wmnet with OS bookworm
17:19 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bookworm
17:18 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1058.eqiad.wmnet with OS bookworm
17:17 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1057.eqiad.wmnet with OS bookworm
17:14 swfrench@deploy1002: Started scap: (no justification provided)
17:13 hashar@deploy1002: Finished deploy [gerrit/gerrit@8c6ae73]: Add image-diff JavaScript plugin (take 2) - T341291 (duration: 00m 07s)
17:13 hashar@deploy1002: Started deploy [gerrit/gerrit@8c6ae73]: Add image-diff JavaScript plugin (take 2) - T341291
17:09 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
17:06 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
17:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5021.eqsin.wmnet with OS bullseye
16:55 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
16:54 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
16:50 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1003.eqiad.wmnet with OS bookworm
16:36 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 100%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65537 and previous config saved to /var/cache/conftool/dbconfig/20240627-163635-arnaudb.json
16:35 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1022.eqiad.wmnet|wikikube-worker1023.eqiad.wmnet|wikikube-worker1024.eqiad.wmnet|wikikube-worker1025.eqiad.wmnet|wikikube-worker1026.eqiad.wmnet),cluster=kubernetes,service=kubesvc
16:35 claime: Pooling and uncordoning wikikube-worker1022.eqiad.wmnet,wikikube-worker1023.eqiad.wmnet,wikikube-worker1024.eqiad.wmnet,wikikube-worker1025.eqiad.wmnet,wikikube-worker1026.eqiad.wmnet - T351074
16:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
16:29 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
16:27 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1002.eqiad.wmnet with OS bookworm
16:21 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 75%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65536 and previous config saved to /var/cache/conftool/dbconfig/20240627-162129-arnaudb.json
16:18 claime: homer 'cr*eqiad*' commit 'T351074'
16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1026.eqiad.wmnet with OS bullseye
16:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1025.eqiad.wmnet with OS bullseye
16:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1023.eqiad.wmnet with OS bullseye
16:06 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 50%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65535 and previous config saved to /var/cache/conftool/dbconfig/20240627-160624-arnaudb.json
16:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1024.eqiad.wmnet with OS bullseye
16:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1022.eqiad.wmnet with OS bullseye
16:01 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
16:00 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: sync
16:00 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: sync
15:58 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
15:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
15:56 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5021.eqsin.wmnet with OS bullseye
15:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy2005
15:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy2005
15:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1026.eqiad.wmnet with reason: host reimage
15:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1025.eqiad.wmnet with reason: host reimage
15:51 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 25%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65534 and previous config saved to /var/cache/conftool/dbconfig/20240627-155118-arnaudb.json
15:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1023.eqiad.wmnet with reason: host reimage
15:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1024.eqiad.wmnet with reason: host reimage
15:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1025.eqiad.wmnet with reason: host reimage
15:43 hnowlan: restarted ferm on 8 failing k8s workers
15:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1022.eqiad.wmnet with reason: host reimage
15:42 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1026.eqiad.wmnet with reason: host reimage
15:42 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1024.eqiad.wmnet with reason: host reimage
15:41 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1002.eqiad.wmnet with OS bookworm
15:41 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1023.eqiad.wmnet with reason: host reimage
15:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1022.eqiad.wmnet with reason: host reimage
15:36 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 10%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65533 and previous config saved to /var/cache/conftool/dbconfig/20240627-153613-arnaudb.json
15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1025.eqiad.wmnet with OS bullseye
15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1025.eqiad.wmnet on all recursors
15:29 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1025.eqiad.wmnet on all recursors
15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1373 to wikikube-worker1025
15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1025
15:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1026.eqiad.wmnet with OS bullseye
15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1026.eqiad.wmnet on all recursors
15:28 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1026.eqiad.wmnet on all recursors
15:27 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1025
15:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1373 to wikikube-worker1025 - cgoubert@cumin1002"
15:27 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1024.eqiad.wmnet with OS bullseye
15:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1024.eqiad.wmnet on all recursors
15:27 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1024.eqiad.wmnet on all recursors
15:26 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1023.eqiad.wmnet with OS bullseye
15:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1023.eqiad.wmnet on all recursors
15:26 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1023.eqiad.wmnet on all recursors
15:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1022.eqiad.wmnet with OS bullseye
15:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1022.eqiad.wmnet on all recursors
15:25 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1022.eqiad.wmnet on all recursors
15:25 pmiazga: T367901 mwmaint1002: Ran `mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=rowiki --logwiki=metawiki 'Rui_Filipe_Fernandes' '44_Gabriel’`
15:24 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1373 to wikikube-worker1025 - cgoubert@cumin1002"
15:23 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1007.eqiad.wmnet
15:21 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:21 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1373 to wikikube-worker1025
15:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1366 to wikikube-worker1024
15:21 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 5%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65532 and previous config saved to /var/cache/conftool/dbconfig/20240627-152107-arnaudb.json
15:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1024
15:20 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1024
15:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1366 to wikikube-worker1024 - cgoubert@cumin1002"
15:19 pmiazga: T368451 mwmaint1002: Ran `mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Agustín_Antonio_Cardozo' 'Agustín_Cardozo_Cabrera’
15:18 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1366 to wikikube-worker1024 - cgoubert@cumin1002"
15:17 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1007.eqiad.wmnet
15:16 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:16 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1366 to wikikube-worker1024
15:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1404 to wikikube-worker1026
15:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1026
15:14 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1026
15:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1404 to wikikube-worker1026 - cgoubert@cumin1002"
15:12 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1404 to wikikube-worker1026 - cgoubert@cumin1002"
15:10 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:10 cgoubert@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
15:10 hashar@deploy1002: Finished deploy [gerrit/gerrit@8c94fee]: Revert "Add image-diff JavaScript plugin" (duration: 00m 07s)
15:09 hashar@deploy1002: Started deploy [gerrit/gerrit@8c94fee]: Revert "Add image-diff JavaScript plugin"
15:09 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1373.eqiad.wmnet
15:09 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1373.eqiad.wmnet
15:08 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:08 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1404 to wikikube-worker1026
15:04 hashar@deploy1002: Finished deploy [gerrit/gerrit@9652bc3]: Add image-diff JavaScript plugin - T341291 (duration: 00m 07s)
15:04 hashar@deploy1002: Started deploy [gerrit/gerrit@9652bc3]: Add image-diff JavaScript plugin - T341291
15:03 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1007.eqiad.wmnet with OS bullseye
15:02 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@483e8c3] (codfw): Bump kartotherian src to latest master (duration: 02m 49s)
15:00 topranks: rebooting lsw1-e7-eqiad to upgrade JunOS on switch T365988
15:00 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1366.eqiad.wmnet
15:00 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1366.eqiad.wmnet
15:00 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@483e8c3] (codfw): Bump kartotherian src to latest master
14:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@483e8c3] (eqiad): Bump kartotherian src to latest master (duration: 03m 10s)
14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on an-worker[1163-1165].eqiad.wmnet,es1037.eqiad.wmnet,ms-be1078.eqiad.wmnet with reason: JunOS upgrade lsw1-e7-eqiad
14:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on an-worker[1163-1165].eqiad.wmnet,es1037.eqiad.wmnet,ms-be1078.eqiad.wmnet with reason: JunOS upgrade lsw1-e7-eqiad
14:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1365 to wikikube-worker1023
14:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1023
14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e7-eqiad,lsw1-e7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e7-eqiad
14:57 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e7-eqiad,lsw1-e7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e7-eqiad
14:56 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@483e8c3] (eqiad): Bump kartotherian src to latest master
14:56 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1023
14:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1365 to wikikube-worker1023 - cgoubert@cumin1002"
14:54 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1365 to wikikube-worker1023 - cgoubert@cumin1002"
14:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host deploy1003.eqiad.wmnet with OS bullseye
14:52 brennen@deploy1002: Finished deploy [phabricator/deployment@0df351e]: deploy phab1004 for minor update (duration: 00m 32s)
14:52 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:52 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1365 to wikikube-worker1023
14:52 brennen@deploy1002: Started deploy [phabricator/deployment@0df351e]: deploy phab1004 for minor update
14:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1359 to wikikube-worker1022
14:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1022
14:51 brennen@deploy1002: Finished deploy [phabricator/deployment@0df351e]: test deploy phab2002 (duration: 00m 34s)
14:50 brennen@deploy1002: Started deploy [phabricator/deployment@0df351e]: test deploy phab2002
14:50 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1022
14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1359 to wikikube-worker1022 - cgoubert@cumin1002"
14:48 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1359 to wikikube-worker1022 - cgoubert@cumin1002"
14:46 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e7-eqiad
14:46 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1007.eqiad.wmnet with reason: host reimage
14:46 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e7-eqiad
14:46 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:46 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1359 to wikikube-worker1022
14:43 dcaro@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1007.eqiad.wmnet with reason: host reimage
14:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1037.eqiad.wmnet with reason: T365988
14:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on es1037.eqiad.wmnet with reason: T365988
14:37 arnaudb@cumin1002: dbctl commit (dc=all): 'T365988 - depool es1037', diff saved to https://phabricator.wikimedia.org/P65531 and previous config saved to /var/cache/conftool/dbconfig/20240627-143741-arnaudb.json
14:15 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet,service=s4
14:12 dcaro@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1007.eqiad.wmnet with OS bullseye
13:54 urbanecm@deploy1002: Finished scap: Backport for CommonSettings: Mark REL1_42 as stable (T359850) (duration: 08m 10s)
13:48 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:46 urbanecm@deploy1002: Started scap: Backport for CommonSettings: Mark REL1_42 as stable (T359850)
13:46 urbanecm@deploy1002: Finished scap: Backport for ptwiki: Enable CommunityConfiguration (T368310) (duration: 08m 58s)
13:42 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1001.eqiad.wmnet with OS bookworm
13:41 urbanecm@deploy1002: urbanecm: Continuing with sync
13:41 urbanecm: Run `mwscript extensions/GrowthExperiments/maintenance/migrateCommunityConfig.php --wiki=ptwiki --force` via mwdebug1001 (T368310)
13:40 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host deploy1003
13:39 urbanecm@deploy1002: urbanecm: Backport for ptwiki: Enable CommunityConfiguration (T368310) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:38 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host deploy1003
13:37 urbanecm@deploy1002: Started scap: Backport for ptwiki: Enable CommunityConfiguration (T368310)
13:36 urbanecm@deploy1002: Finished scap: Backport for Enable local uploads for Gilaki Wikipedia (T364673), [noop] Remove $wgRedirectScript, not used since MediaWiki 1.22, CommunityConfiguration: Log info and higher (duration: 10m 22s)
13:35 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:31 urbanecm@deploy1002: urbanecm, tgr, nmw03: Continuing with sync
13:28 urbanecm@deploy1002: urbanecm, tgr, nmw03: Backport for Enable local uploads for Gilaki Wikipedia (T364673), [noop] Remove $wgRedirectScript, not used since MediaWiki 1.22, CommunityConfiguration: Log info and higher synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:25 urbanecm@deploy1002: Started scap: Backport for Enable local uploads for Gilaki Wikipedia (T364673), [noop] Remove $wgRedirectScript, not used since MediaWiki 1.22, CommunityConfiguration: Log info and higher
13:24 urbanecm@deploy1002: Finished scap: Backport for [CheckUser] Stop writing old for event tables migration on all wikis (T360685), testwiki: use shellbox-video for scaling video (T356241), Add VK namespace alias to Azerbaijani Wikibooks (T368237) (duration: 16m 48s)
13:19 urbanecm@deploy1002: urbanecm, dreamrimmer, hnowlan, dreamyjazz: Continuing with sync
13:12 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
13:10 urbanecm@deploy1002: urbanecm, dreamrimmer, hnowlan, dreamyjazz: Backport for [CheckUser] Stop writing old for event tables migration on all wikis (T360685), testwiki: use shellbox-video for scaling video (T356241), Add VK namespace alias to Azerbaijani Wikibooks (T368237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:08 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
13:08 urbanecm@deploy1002: Started scap: Backport for [CheckUser] Stop writing old for event tables migration on all wikis (T360685), testwiki: use shellbox-video for scaling video (T356241), Add VK namespace alias to Azerbaijani Wikibooks (T368237)
13:02 sukhe: A:dnsbox: remove 10.3.0.2/32 from /e/n/i
12:52 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bookworm
12:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T367856)', diff saved to https://phabricator.wikimedia.org/P65529 and previous config saved to /var/cache/conftool/dbconfig/20240627-125019-marostegui.json
12:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
12:50 sukhe: sudo cumin 'A:dnsbox' 'rm /var/lib/dnsbox/ntp.state': remove obsolete ntp.state file
12:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
12:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T367856)', diff saved to https://phabricator.wikimedia.org/P65528 and previous config saved to /var/cache/conftool/dbconfig/20240627-124957-marostegui.json
12:48 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ml-serve2007.codfw.wmnet with reason: Hardware maintenance for memory errors
12:48 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ml-serve2007.codfw.wmnet with reason: Hardware maintenance for memory errors
12:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bullseye
12:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
12:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
12:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T364069)', diff saved to https://phabricator.wikimedia.org/P65527 and previous config saved to /var/cache/conftool/dbconfig/20240627-123805-marostegui.json
12:22 jclark@cumin1002: START - Cookbook sre.dns.netbox
12:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P65524 and previous config saved to /var/cache/conftool/dbconfig/20240627-121942-marostegui.json
12:17 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
12:15 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
12:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:12 cmooney@cumin1002: START - Cookbook sre.dns.netbox
12:12 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
12:12 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
12:12 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
12:11 cmooney@cumin1002: START - Cookbook sre.dns.netbox
12:10 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
12:10 cmooney@cumin1002: START - Cookbook sre.dns.netbox
12:10 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
12:08 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
12:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P65523 and previous config saved to /var/cache/conftool/dbconfig/20240627-120751-marostegui.json
12:07 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
12:07 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
12:07 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
12:07 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
12:07 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T367856)', diff saved to https://phabricator.wikimedia.org/P65522 and previous config saved to /var/cache/conftool/dbconfig/20240627-120435-marostegui.json
12:00 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
12:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
11:59 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
11:59 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
11:59 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
11:58 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
11:53 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
11:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T364069)', diff saved to https://phabricator.wikimedia.org/P65521 and previous config saved to /var/cache/conftool/dbconfig/20240627-115244-marostegui.json
11:51 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
11:51 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
11:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
11:49 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
11:48 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
11:47 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
11:47 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
11:46 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
11:46 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
11:46 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
11:46 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
11:42 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
11:42 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
11:41 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
11:41 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
11:41 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
11:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
11:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
11:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
11:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
11:39 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
11:39 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
11:39 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
11:39 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
11:38 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
11:38 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
11:38 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
11:36 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
11:36 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
11:35 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
11:35 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
11:35 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
11:35 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
11:35 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
11:34 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
11:34 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
11:34 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
11:34 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
11:33 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
11:33 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
11:32 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
11:32 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
11:30 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
11:29 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
11:29 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
11:28 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
11:28 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
11:28 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
11:28 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
11:28 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
11:27 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
11:27 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
11:27 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
11:27 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
11:26 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
11:25 cgoubert@deploy1002: Finished scap: Deploy new prometheus-php-fpm-exporter, prometheus-apache-exporter - T283861 (duration: 06m 17s)
11:24 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test1004.wikimedia.org with OS bookworm
11:19 cgoubert@deploy1002: Started scap: Deploy new prometheus-php-fpm-exporter, prometheus-apache-exporter - T283861
11:17 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:17 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:14 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:14 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:12 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:07 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test1004.wikimedia.org with reason: host reimage
11:03 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test1004.wikimedia.org with reason: host reimage
11:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:00 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
10:51 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
10:51 claime: Deploying new prometheus-php-fpm-exporter, prometheus-apache-exporter to mw-on-k8s and shellbox - T283861
10:49 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test1004.wikimedia.org with OS bookworm
10:48 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp-test2004.wikimedia.org
10:48 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test2004.wikimedia.org with OS bookworm
10:43 fabfur: re-enabling puppet on A:cp-text_ulsfo (reverted https://gerrit.wikimedia.org/r/c/operations/puppet/+/1050297) (T365718)
10:38 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
10:30 fabfur: correcting previous statement: puppet disabled just on A:cp-text_ulsfo
10:28 fabfur: disable puppet on A:cp-ulsfo to apply selectively https://gerrit.wikimedia.org/r/c/operations/puppet/+/1050258 (T365718)
10:24 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
10:24 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
10:20 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
10:04 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
10:04 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
09:42 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
09:41 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
09:40 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test2004.wikimedia.org with reason: host reimage
09:38 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test2004.wikimedia.org with reason: host reimage
09:21 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test2004.wikimedia.org with OS bookworm
09:14 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2004.wikimedia.org - slyngshede@cumin1002"
09:13 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2004.wikimedia.org - slyngshede@cumin1002"
09:13 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp-test2004.wikimedia.org on all recursors
09:13 slyngshede@cumin1002: START - Cookbook sre.dns.wipe-cache idp-test2004.wikimedia.org on all recursors
09:13 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:13 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2004.wikimedia.org - slyngshede@cumin1002"
09:12 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2004.wikimedia.org - slyngshede@cumin1002"
09:09 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
09:09 slyngshede@cumin1002: START - Cookbook sre.ganeti.makevm for new host idp-test2004.wikimedia.org
09:04 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti1019.eqiad.wmnet
09:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1019.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
09:01 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1019.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1019.eqiad.wmnet
08:48 slyngshede@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host idp-test1004.wikimedia.org
08:48 slyngshede@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test1004.wikimedia.org with OS bookworm
08:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T364069)', diff saved to https://phabricator.wikimedia.org/P65518 and previous config saved to /var/cache/conftool/dbconfig/20240627-084043-marostegui.json
08:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
08:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
08:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc1001.wikimedia.org
08:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:36 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:33 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:27 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc1001.wikimedia.org
08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc2001.wikimedia.org
08:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:20 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
08:10 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 100% weight T363812', diff saved to https://phabricator.wikimedia.org/P65517 and previous config saved to /var/cache/conftool/dbconfig/20240627-081044-jynus.json
08:10 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1022 at 100% weight T363812', diff saved to https://phabricator.wikimedia.org/P65516 and previous config saved to /var/cache/conftool/dbconfig/20240627-081016-jynus.json
08:08 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:04 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
07:59 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 50% weight T363812', diff saved to https://phabricator.wikimedia.org/P65515 and previous config saved to /var/cache/conftool/dbconfig/20240627-075944-jynus.json
07:57 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
07:57 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
07:56 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1022 at 50% weight T363812', diff saved to https://phabricator.wikimedia.org/P65514 and previous config saved to /var/cache/conftool/dbconfig/20240627-075620-jynus.json
07:54 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 10% weight T363812', diff saved to https://phabricator.wikimedia.org/P65513 and previous config saved to /var/cache/conftool/dbconfig/20240627-075447-jynus.json
07:50 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
07:45 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1022 at 10% weight T363812', diff saved to https://phabricator.wikimedia.org/P65512 and previous config saved to /var/cache/conftool/dbconfig/20240627-074542-jynus.json
07:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
07:36 kartik@deploy1002: Finished scap: Backport for Add Metrics Platform stream configuration and registration for MinT for Wikipedia Readers feature by Language and Product Localization team. (T368028) (duration: 08m 42s)
07:36 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test1004.wikimedia.org with OS bookworm
07:35 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test1004.wikimedia.org - slyngshede@cumin1002"
07:34 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test1004.wikimedia.org - slyngshede@cumin1002"
07:34 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp-test1004.wikimedia.org on all recursors
07:34 slyngshede@cumin1002: START - Cookbook sre.dns.wipe-cache idp-test1004.wikimedia.org on all recursors
07:33 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:33 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test1004.wikimedia.org - slyngshede@cumin1002"
07:33 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc2001.wikimedia.org
07:32 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test1004.wikimedia.org - slyngshede@cumin1002"
07:31 kartik@deploy1002: kcvelaga, kartik: Continuing with sync
07:30 kartik@deploy1002: kcvelaga, kartik: Backport for Add Metrics Platform stream configuration and registration for MinT for Wikipedia Readers feature by Language and Product Localization team. (T368028) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:27 kartik@deploy1002: Started scap: Backport for Add Metrics Platform stream configuration and registration for MinT for Wikipedia Readers feature by Language and Product Localization team. (T368028)
07:24 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
07:24 slyngshede@cumin1002: START - Cookbook sre.ganeti.makevm for new host idp-test1004.wikimedia.org
07:18 kartik@deploy1002: Finished scap: Backport for Enable MinT for Wikipedia readers MVP on a set of pilot wikis (T363465) (duration: 14m 19s)
07:13 kartik@deploy1002: kartik: Continuing with sync
07:06 kartik@deploy1002: kartik: Backport for Enable MinT for Wikipedia readers MVP on a set of pilot wikis (T363465) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:04 kartik@deploy1002: Started scap: Backport for Enable MinT for Wikipedia readers MVP on a set of pilot wikis (T363465)
06:45 arnaudb@cumin1002: dbctl commit (dc=all): 'weight es1038 T368401', diff saved to https://phabricator.wikimedia.org/P65510 and previous config saved to /var/cache/conftool/dbconfig/20240627-064506-arnaudb.json
06:40 arnaudb@deploy1002: Finished scap: Backport for Revert "mariadb: disable writes on es6" (duration: 07m 43s)
06:35 arnaudb@deploy1002: arnaudb: Continuing with sync
06:35 arnaudb@deploy1002: arnaudb: Backport for Revert "mariadb: disable writes on es6" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:32 arnaudb@deploy1002: Started scap: Backport for Revert "mariadb: disable writes on es6"
06:23 arnaudb@cumin1002: dbctl commit (dc=all): 'weight es1037 T368401', diff saved to https://phabricator.wikimedia.org/P65509 and previous config saved to /var/cache/conftool/dbconfig/20240627-062338-arnaudb.json
06:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote es1038 to es6 primary T368401', diff saved to https://phabricator.wikimedia.org/P65508 and previous config saved to /var/cache/conftool/dbconfig/20240627-061639-arnaudb.json
06:15 arnaudb: Starting es6 eqiad failover from es1037 to es1038 - T368401
06:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Set es1038 with weight 0 T368401', diff saved to https://phabricator.wikimedia.org/P65507 and previous config saved to /var/cache/conftool/dbconfig/20240627-061055-arnaudb.json
06:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es6 T368401
06:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es6 T368401
06:09 arnaudb@deploy1002: Finished scap: Backport for mariadb: disable writes on es6 (T368401) (duration: 08m 00s)
06:04 arnaudb@deploy1002: arnaudb: Continuing with sync
06:04 arnaudb@deploy1002: arnaudb: Backport for mariadb: disable writes on es6 (T368401) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:01 arnaudb@deploy1002: Started scap: Backport for mariadb: disable writes on es6 (T368401)
03:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
03:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
03:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T364069)', diff saved to https://phabricator.wikimedia.org/P65506 and previous config saved to /var/cache/conftool/dbconfig/20240627-035544-marostegui.json
03:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P65505 and previous config saved to /var/cache/conftool/dbconfig/20240627-034037-marostegui.json
03:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P65504 and previous config saved to /var/cache/conftool/dbconfig/20240627-032530-marostegui.json
03:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T364069)', diff saved to https://phabricator.wikimedia.org/P65503 and previous config saved to /var/cache/conftool/dbconfig/20240627-031023-marostegui.json
00:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T367856)', diff saved to https://phabricator.wikimedia.org/P65502 and previous config saved to /var/cache/conftool/dbconfig/20240627-005613-marostegui.json
00:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
00:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
00:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367856)', diff saved to https://phabricator.wikimedia.org/P65501 and previous config saved to /var/cache/conftool/dbconfig/20240627-005549-marostegui.json
00:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65500 and previous config saved to /var/cache/conftool/dbconfig/20240627-004042-marostegui.json
00:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65499 and previous config saved to /var/cache/conftool/dbconfig/20240627-002535-marostegui.json
00:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367856)', diff saved to https://phabricator.wikimedia.org/P65498 and previous config saved to /var/cache/conftool/dbconfig/20240627-001028-marostegui.json

2024-06-26

23:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5021.eqsin.wmnet with OS bullseye
23:26 mutante: people1004 - stopped confd which logs every 3 seconds that it can't find any templates (T356296)
23:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
23:20 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
23:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T364069)', diff saved to https://phabricator.wikimedia.org/P65497 and previous config saved to /var/cache/conftool/dbconfig/20240626-231020-marostegui.json
23:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
23:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
23:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T364069)', diff saved to https://phabricator.wikimedia.org/P65496 and previous config saved to /var/cache/conftool/dbconfig/20240626-230958-marostegui.json
22:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P65495 and previous config saved to /var/cache/conftool/dbconfig/20240626-225451-marostegui.json
22:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
22:41 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5021.eqsin.wmnet
22:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P65494 and previous config saved to /var/cache/conftool/dbconfig/20240626-223944-marostegui.json
22:26 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5020.eqsin.wmnet
22:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T364069)', diff saved to https://phabricator.wikimedia.org/P65493 and previous config saved to /var/cache/conftool/dbconfig/20240626-222434-marostegui.json
22:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5020.eqsin.wmnet with OS bullseye
21:50 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
21:46 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
21:40 cjming: end of UTC late backport window
21:38 cjming@deploy1002: Finished scap: Backport for Homepage: don't load yesterdays edits on desktop (T368405) (duration: 08m 48s)
21:33 cjming@deploy1002: cjming, migr: Continuing with sync
21:32 cjming@deploy1002: cjming, migr: Backport for Homepage: don't load yesterdays edits on desktop (T368405) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:29 cjming@deploy1002: Started scap: Backport for Homepage: don't load yesterdays edits on desktop (T368405)
21:29 hashar: restarting CI Jenkins
21:13 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
21:13 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5020.eqsin.wmnet with OS bullseye
21:05 cjming@deploy1002: Finished scap: Backport for Homepage: log rendering time for each module and each wiki (T368405) (duration: 14m 01s)
20:59 eileen: config revision changed from 0b822cd3 to 994e7b81
20:57 cjming@deploy1002: cjming, migr: Continuing with sync
20:55 cjming@deploy1002: cjming, migr: Backport for Homepage: log rendering time for each module and each wiki (T368405) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:51 cjming@deploy1002: Started scap: Backport for Homepage: log rendering time for each module and each wiki (T368405)
20:50 jdrewniak@deploy1002: Finished scap: Backport for Enable user pages and select special pages in dark mode (1.43.0-wmf.11) (T366364 T366375 T367375 T367581 T367582 T367583) (duration: 08m 09s)
20:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
20:45 jdrewniak@deploy1002: jdlrobson, jdrewniak: Continuing with sync
20:45 jdrewniak@deploy1002: jdlrobson, jdrewniak: Backport for Enable user pages and select special pages in dark mode (1.43.0-wmf.11) (T366364 T366375 T367375 T367581 T367582 T367583) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:42 jdrewniak@deploy1002: Started scap: Backport for Enable user pages and select special pages in dark mode (1.43.0-wmf.11) (T366364 T366375 T367375 T367581 T367582 T367583)
20:40 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 58s)
20:33 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 27s)
20:28 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5020.eqsin.wmnet
20:21 cjming@deploy1002: Finished scap: Backport for Update QuickSurvey coverage rate for Automoderator patroller workstream survey (T362969) (duration: 08m 46s)
20:15 cjming@deploy1002: cjming, kgraessle: Continuing with sync
20:14 cjming@deploy1002: cjming, kgraessle: Backport for Update QuickSurvey coverage rate for Automoderator patroller workstream survey (T362969) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:12 cjming@deploy1002: Started scap: Backport for Update QuickSurvey coverage rate for Automoderator patroller workstream survey (T362969)
20:08 mutante: lists1001:/lib/systemd/system# rm wmf_auto_restart_apache2.* ; systemctl reset-failed - reaction to monitoring alert "FIRING: SystemdUnitFailed: wmf_auto_restart_apache2.service on lists1001:9100"
20:08 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet
20:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5019.eqsin.wmnet with OS bullseye
19:48 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.11 refs T366956
19:40 jhathaway@deploy1002: Finished scap: (no justification provided) (duration: 02m 38s)
19:39 jhathaway@deploy1002: Started scap: (no justification provided)
19:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5019.eqsin.wmnet with reason: host reimage
19:28 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5019.eqsin.wmnet with reason: host reimage
19:18 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:16 cmooney@cumin1002: START - Cookbook sre.dns.netbox
19:11 ottomata: re-enabling varnishkafka-eventlogging and varnish /beacon/event handling on cache text nodes. /beacon/event/ redirects which breaks the MediaWikiPingback usage - T238230
19:02 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.11 refs T366956
18:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5019.eqsin.wmnet with OS bullseye
18:55 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5019.eqsin.wmnet with OS bullseye
18:26 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:26 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove ntp.anycast.wmnet - sukhe@cumin1002"
18:25 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove ntp.anycast.wmnet - sukhe@cumin1002"
18:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T364069)', diff saved to https://phabricator.wikimedia.org/P65490 and previous config saved to /var/cache/conftool/dbconfig/20240626-182355-marostegui.json
18:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
18:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T364069)', diff saved to https://phabricator.wikimedia.org/P65489 and previous config saved to /var/cache/conftool/dbconfig/20240626-182333-marostegui.json
18:23 sukhe@cumin1002: START - Cookbook sre.dns.netbox
18:19 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5019.eqsin.wmnet with OS bullseye
18:17 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.11 refs T366956
18:14 sukhe: # etcdctl --username root --endpoints https://conf1007.eqiad.wmnet:4001 rmdir /conftool/v1/pools/${site}/dnsbox/ntp: T366360
18:12 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5019.eqsin.wmnet
18:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P65488 and previous config saved to /var/cache/conftool/dbconfig/20240626-180824-marostegui.json
18:07 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@5121748]: Deploying latest DAGs to analytics Airflow instance. (duration: 00m 39s)
18:06 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@5121748]: Deploying latest DAGs to analytics Airflow instance.
17:59 sukhe: sudo cumin -b10 "A:cp-text" "run-puppet-agent"
17:58 sukhe: sudo cumin -b1 -s30 "A:cp-text" "run-puppet-agent"
17:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P65487 and previous config saved to /var/cache/conftool/dbconfig/20240626-175317-marostegui.json
17:51 ottomata: disabling varnishkafka-eventlogging and varnish /beacon/event handling on ache text nodes. Puppet is disabled on all cache text, will test a few at a time first. - T238230
17:46 sukhe: disable puppet in A:cp-text
17:43 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5018.eqsin.wmnet
17:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5018.eqsin.wmnet with OS bullseye
17:39 sukhe: sudo cumin "A:dnsbox" "run-puppet-agent"
17:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T364069)', diff saved to https://phabricator.wikimedia.org/P65486 and previous config saved to /var/cache/conftool/dbconfig/20240626-173810-marostegui.json
17:37 mnz@deploy1002: Finished deploy [airflow-dags/research@5121748]: (no justification provided) (duration: 00m 11s)
17:37 mnz@deploy1002: Started deploy [airflow-dags/research@5121748]: (no justification provided)
17:29 xcollazo@deploy1002: Finished deploy [analytics/refinery@ca1acb3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ca1acb34] (duration: 02m 54s)
17:26 xcollazo@deploy1002: Started deploy [analytics/refinery@ca1acb3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ca1acb34]
17:26 xcollazo@deploy1002: Finished deploy [analytics/refinery@ca1acb3] (thin): Regular analytics weekly train THIN [analytics/refinery@ca1acb34] (duration: 04m 12s)
17:22 xcollazo@deploy1002: Started deploy [analytics/refinery@ca1acb3] (thin): Regular analytics weekly train THIN [analytics/refinery@ca1acb34]
17:17 mnz@deploy1002: Finished deploy [airflow-dags/research@1996a7a]: (no justification provided) (duration: 00m 03s)
17:17 mnz@deploy1002: Started deploy [airflow-dags/research@1996a7a]: (no justification provided)
17:16 sukhe: re-enable puppet on A:cp-text
17:14 ladsgroup@deploy1002: Finished scap: Backport for Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098), Skip failing ForeignResourceStructureTest (T362425), Skip failing ForeignResourceStructureTest (T362425), Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098) (duration: 08m 52s)
17:09 ladsgroup@deploy1002: ladsgroup: Continuing with sync
17:08 ladsgroup@deploy1002: ladsgroup: Backport for Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098), Skip failing ForeignResourceStructureTest (T362425), Skip failing ForeignResourceStructureTest (T362425), Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwd
17:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
17:06 ladsgroup@deploy1002: Started scap: Backport for Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098), Skip failing ForeignResourceStructureTest (T362425), Skip failing ForeignResourceStructureTest (T362425), Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098)
17:03 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
17:01 xcollazo@deploy1002: Finished deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34] (duration: 09m 16s)
16:52 xcollazo@deploy1002: Started deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34]
16:52 sukhe: disable puppet on A:cp-text
16:50 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
16:50 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
16:44 mnz@deploy1002: Finished deploy [airflow-dags/research@1996a7a]: (no justification provided) (duration: 00m 03s)
16:44 mnz@deploy1002: Started deploy [airflow-dags/research@1996a7a]: (no justification provided)
16:39 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
16:38 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
16:30 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5018.eqsin.wmnet with OS bullseye
16:27 xcollazo@deploy1002: Finished deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34] (duration: 00m 29s)
16:27 xcollazo@deploy1002: Started deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34]
16:25 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5018.eqsin.wmnet
16:15 mnz@deploy1002: Finished deploy [airflow-dags/research@1996a7a]: (no justification provided) (duration: 00m 33s)
16:14 mnz@deploy1002: Started deploy [airflow-dags/research@1996a7a]: (no justification provided)
15:58 sukhe: sudo cumin -b1 -s120 "A:dnsbox and not P{dns6001*}" "run-puppet-agent --enable 'rolling out CR 1048064'"
15:58 sukhe: sudo cumin -b1 -s120 "A:dnsbox and not P{dns6001*}" "run-puppet-agent --enable 'rolling out CR 1049969'"
15:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
15:38 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt2003-dev.codfw.wmnet
15:38 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:38 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
15:36 andrew@cumin1002: START - Cookbook sre.dns.netbox
15:35 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
15:32 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on logstash1024.eqiad.wmnet with reason: Temporary stop to migrate the VM away from the ganeti node
15:32 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on logstash1024.eqiad.wmnet with reason: Temporary stop to migrate the VM away from the ganeti node
15:32 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt2003-dev.codfw.wmnet
15:27 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
15:27 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
15:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
15:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
15:25 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
15:25 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
15:20 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on logstash1023.eqiad.wmnet with reason: Temporary stop to migrate the VM away from the ganeti node
15:20 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on logstash1023.eqiad.wmnet with reason: Temporary stop to migrate the VM away from the ganeti node
15:16 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt2003-dev.codfw.wmnet
15:16 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:16 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2003-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
15:15 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1006.eqiad.wmnet with OS bullseye
15:14 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2003-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
15:14 sukhe: sudo cumin "A:dnsbox" 'disable-puppet "rolling out CR 1048064"'
15:12 andrew@cumin1002: START - Cookbook sre.dns.netbox
15:08 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt2003-dev.codfw.wmnet
15:08 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt2002-dev.codfw.wmnet
15:07 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:07 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2002-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
15:06 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2002-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
15:04 andrew@cumin1002: START - Cookbook sre.dns.netbox
15:00 taavi@cumin1002: END (ERROR) - Cookbook sre.puppet.renew-cert (exit_code=97) for cloudcephosd1006.eqiad.wmnet: Renew puppet certificate - taavi@cumin1002
14:59 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt2002-dev.codfw.wmnet
14:58 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt2001-dev.codfw.wmnet
14:58 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:58 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
14:58 taavi@cumin1002: START - Cookbook sre.puppet.renew-cert for cloudcephosd1006.eqiad.wmnet: Renew puppet certificate - taavi@cumin1002
14:57 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
14:54 andrew@cumin1002: START - Cookbook sre.dns.netbox
14:48 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt2001-dev.codfw.wmnet
14:40 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbproxy2007 to codfw - jhancock@cumin2002"
14:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbproxy2007 to codfw - jhancock@cumin2002"
14:33 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:29 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Add shellbox-video vars/config, enable on beta (T356241) (duration: 08m 22s)
14:24 logmsgbot: lucaswerkmeister-wmde@deploy1002 hnowlan, lucaswerkmeister-wmde: Continuing with sync
14:24 logmsgbot: lucaswerkmeister-wmde@deploy1002 hnowlan, lucaswerkmeister-wmde: Backport for Add shellbox-video vars/config, enable on beta (T356241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:21 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Add shellbox-video vars/config, enable on beta (T356241)
14:21 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore[1004-1006].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
14:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for wikidatawiki: Add namespace 640 (EntitySchema) to $wgContentNamespaces (T368010) (duration: 07m 57s)
14:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
14:14 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for wikidatawiki: Add namespace 640 (EntitySchema) to $wgContentNamespaces (T368010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:13 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:13 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:11 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:11 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for wikidatawiki: Add namespace 640 (EntitySchema) to $wgContentNamespaces (T368010)
14:09 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:09 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
14:08 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:07 jforrester@deploy1002: Finished scap: Backport for CodeEditor.vue: add watcher for disabled state (T368504) (duration: 08m 00s)
14:07 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
14:06 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
14:06 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
14:06 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
14:05 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:05 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
14:04 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:04 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
14:04 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:02 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:02 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore[1004-1006].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
14:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
14:02 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore[2005-2006].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
14:01 jforrester@deploy1002: jforrester: Continuing with sync
14:01 jforrester@deploy1002: jforrester: Backport for CodeEditor.vue: add watcher for disabled state (T368504) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:01 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:01 claime: Deploying statsd-exporter for mw-api-int - T365265
14:01 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database btmwiki (T368066)
14:01 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
13:59 jforrester@deploy1002: Started scap: Backport for CodeEditor.vue: add watcher for disabled state (T368504)
13:56 Lucas_WMDE: UTC afternoon backport+config window done (I might deploy a few more patches later out-of-window)
13:55 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [u4cwiki] Enable importing from dewiki/enwiki/metawiki (T368522), [arbcom_itwiki] Change the logo and a new wordmark and a favicon (T368532) (duration: 08m 49s)
13:50 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, superpes: Continuing with sync
13:49 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore[2005-2006].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
13:49 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, superpes: Backport for [u4cwiki] Enable importing from dewiki/enwiki/metawiki (T368522), [arbcom_itwiki] Change the logo and a new wordmark and a favicon (T368532) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:46 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [u4cwiki] Enable importing from dewiki/enwiki/metawiki (T368522), [arbcom_itwiki] Change the logo and a new wordmark and a favicon (T368532)
13:42 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [ltwiki] Add a new 'rollbacker' usergroup (T367993) (duration: 08m 48s)
13:37 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, superpes: Continuing with sync
13:36 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, superpes: Backport for [ltwiki] Add a new 'rollbacker' usergroup (T367993) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database btmwiki (T368066)
13:33 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [ltwiki] Add a new 'rollbacker' usergroup (T367993)
13:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T364069)', diff saved to https://phabricator.wikimedia.org/P65481 and previous config saved to /var/cache/conftool/dbconfig/20240626-133239-marostegui.json
13:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
13:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
13:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65480 and previous config saved to /var/cache/conftool/dbconfig/20240626-133216-marostegui.json
13:31 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database btmwiki (T368066)
13:30 Lucas_WMDE: lucaswerkmeister-wmde@deploy1002 /srv/mediawiki-staging (master $ u=) $ mwscript-k8s namespaceDupes maiwiki -- --fix # T363667, 0 pages/links to fix, i.e. no-op
13:28 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Meta-Wiki: restrict unfuzzy rights to autoconfirmed (T368416), maiwiki: Remove 'CA' namespace alias (T363667) (duration: 10m 50s)
13:28 elukey@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
13:23 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Continuing with sync
13:23 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bullseye
13:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Backport for Meta-Wiki: restrict unfuzzy rights to autoconfirmed (T368416), maiwiki: Remove 'CA' namespace alias (T363667) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:19 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database btmwiki (T368066)
13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Meta-Wiki: restrict unfuzzy rights to autoconfirmed (T368416), maiwiki: Remove 'CA' namespace alias (T363667)
13:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P65479 and previous config saved to /var/cache/conftool/dbconfig/20240626-131709-marostegui.json
13:15 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [CheckUser] Stop writing old for event tables migration on group1 (T360685) (duration: 12m 09s)
13:13 elukey: reload nginx on registry* nodes (Docker registry) to pick up new logging changes
13:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamyjazz: Continuing with sync
13:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamyjazz: Backport for [CheckUser] Stop writing old for event tables migration on group1 (T360685) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:03 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [CheckUser] Stop writing old for event tables migration on group1 (T360685)
13:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P65478 and previous config saved to /var/cache/conftool/dbconfig/20240626-130201-marostegui.json
12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65476 and previous config saved to /var/cache/conftool/dbconfig/20240626-124654-marostegui.json
12:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
12:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T367856)', diff saved to https://phabricator.wikimedia.org/P65471 and previous config saved to /var/cache/conftool/dbconfig/20240626-121158-marostegui.json
12:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
12:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
12:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367856)', diff saved to https://phabricator.wikimedia.org/P65470 and previous config saved to /var/cache/conftool/dbconfig/20240626-121136-marostegui.json
11:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65469 and previous config saved to /var/cache/conftool/dbconfig/20240626-115628-marostegui.json
11:55 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
11:55 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
11:54 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
11:54 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
11:54 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:54 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
11:52 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:52 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
11:51 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:51 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
11:44 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
11:43 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
11:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65468 and previous config saved to /var/cache/conftool/dbconfig/20240626-114121-marostegui.json
11:41 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
11:40 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
11:39 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:39 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
11:35 moritzm: installing emacs security updates
11:27 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
11:26 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
11:26 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
11:26 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
11:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367856)', diff saved to https://phabricator.wikimedia.org/P65467 and previous config saved to /var/cache/conftool/dbconfig/20240626-112614-marostegui.json
11:24 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bullseye
11:24 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
11:23 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
11:19 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2022 fully T363812', diff saved to https://phabricator.wikimedia.org/P65466 and previous config saved to /var/cache/conftool/dbconfig/20240626-111934-jynus.json
11:14 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
11:13 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
11:12 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
11:07 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
11:07 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
11:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
11:06 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
10:39 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2022 at 50% T363812', diff saved to https://phabricator.wikimedia.org/P65465 and previous config saved to /var/cache/conftool/dbconfig/20240626-103933-jynus.json
10:25 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2022 after backup T363812', diff saved to https://phabricator.wikimedia.org/P65464 and previous config saved to /var/cache/conftool/dbconfig/20240626-102523-jynus.json
10:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
10:13 claime: enabling puppet on cp-text - T367949
10:04 claime: enabling puppet on cp4037 - T367949
10:02 claime: disabling puppet on cp-text - T367949
09:59 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
09:55 slyngs: Update idp.wikimedia.org to CAS 6.6.15.2 (T368503)
09:50 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
09:48 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
09:46 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
09:44 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
09:38 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
09:01 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test1002.wikimedia.org with OS bookworm
08:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136 T365805', diff saved to https://phabricator.wikimedia.org/P65463 and previous config saved to /var/cache/conftool/dbconfig/20240626-085511-root.json
08:44 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for puppetmaster1003.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
08:42 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for puppetmaster1003.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
08:40 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test1002.wikimedia.org with reason: host reimage
08:39 hashar@deploy1002: Finished deploy [gerrit/gerrit@2fc2b03]: Gerrit to 3.10 on gerrit1003 # T367419 (duration: 00m 43s)
08:39 hashar@deploy1002: Started deploy [gerrit/gerrit@2fc2b03]: Gerrit to 3.10 on gerrit1003 # T367419
08:38 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test1002.wikimedia.org with reason: host reimage
08:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65462 and previous config saved to /var/cache/conftool/dbconfig/20240626-083733-marostegui.json
08:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
08:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
08:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T364069)', diff saved to https://phabricator.wikimedia.org/P65461 and previous config saved to /var/cache/conftool/dbconfig/20240626-083711-marostegui.json
08:32 hashar@deploy1002: Finished deploy [gerrit/gerrit@2fc2b03]: Gerrit to 3.10 on gerrit2002 # T367419 (duration: 00m 48s)
08:31 hashar@deploy1002: Started deploy [gerrit/gerrit@2fc2b03]: Gerrit to 3.10 on gerrit2002 # T367419
08:25 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test1002.wikimedia.org with OS bookworm
08:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P65460 and previous config saved to /var/cache/conftool/dbconfig/20240626-082204-marostegui.json
08:11 jynus@cumin1002: dbctl commit (dc=all): 'Depool es1025 for backups T363812', diff saved to https://phabricator.wikimedia.org/P65458 and previous config saved to /var/cache/conftool/dbconfig/20240626-081130-jynus.json
08:10 marostegui@cumin1002: dbctl commit (dc=all): 'Set es1023 as es5 master - this is a NOOP', diff saved to https://phabricator.wikimedia.org/P65457 and previous config saved to /var/cache/conftool/dbconfig/20240626-081014-marostegui.json
08:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P65456 and previous config saved to /var/cache/conftool/dbconfig/20240626-080657-marostegui.json
08:06 marostegui@cumin1002: dbctl commit (dc=all): 'Fix weights for es2021 and es2024', diff saved to https://phabricator.wikimedia.org/P65455 and previous config saved to /var/cache/conftool/dbconfig/20240626-080649-marostegui.json
07:59 jynus@cumin1002: dbctl commit (dc=all): 'Depool es1022 for backups T363812', diff saved to https://phabricator.wikimedia.org/P65454 and previous config saved to /var/cache/conftool/dbconfig/20240626-075946-jynus.json
07:54 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2025 at 100% load', diff saved to https://phabricator.wikimedia.org/P65453 and previous config saved to /var/cache/conftool/dbconfig/20240626-075428-jynus.json
07:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T364069)', diff saved to https://phabricator.wikimedia.org/P65451 and previous config saved to /var/cache/conftool/dbconfig/20240626-075043-marostegui.json
07:44 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
07:33 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2025 at 50% load', diff saved to https://phabricator.wikimedia.org/P65449 and previous config saved to /var/cache/conftool/dbconfig/20240626-073304-jynus.json
07:28 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2025 with low load for warmup', diff saved to https://phabricator.wikimedia.org/P65448 and previous config saved to /var/cache/conftool/dbconfig/20240626-072810-jynus.json
07:03 moritzm: installing emacs security updates
06:56 marostegui@cumin1002: dbctl commit (dc=all): 'Pool db2136 - running 10.11 with minium weight T365805', diff saved to https://phabricator.wikimedia.org/P65447 and previous config saved to /var/cache/conftool/dbconfig/20240626-065636-marostegui.json
06:52 marostegui: Enable slow query log on db2136 running 10.11 T365805
06:39 marostegui: Install mariadb 10.11 on s4 db2136 (depooled for now) T365805
06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136 T365805', diff saved to https://phabricator.wikimedia.org/P65446 and previous config saved to /var/cache/conftool/dbconfig/20240626-063109-root.json
06:01 marostegui: dbmaint eqiad Drop ipblocks in s1 T367632
05:59 marostegui: dbmaint eqiad Drop ipblocks in s3 T367632
05:57 marostegui: dbmaint eqiad Drop ipblocks in s4 T367632
05:39 ryankemper: [Elastic] `curl -s -X POST https://search.svc.eqiad.wmnet:9243/_cluster/reroute?retry_failed=true` did the trick. Shard initializing, cluster should be back to green soon enough
05:36 ryankemper: [Elastic] One unassigned shard; cluster status yellow. Not a big deal, looks like `shard has exceeded the maximum number of retries [5] on failed allocation attempts`, I'll try a manual `/_cluster/reroute?retry_failed=true`
05:01 marostegui: dbmaint eqiad Drop ipblocks in s5 T367632
04:53 marostegui: dbmaint eqiad Drop ipblocks in s2 T367632
04:51 marostegui: dbmaint eqiad Drop ipblocks in s8 T367632
03:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1193 (T364069)', diff saved to https://phabricator.wikimedia.org/P65445 and previous config saved to /var/cache/conftool/dbconfig/20240626-033955-marostegui.json
03:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
03:39 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
03:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T364069)', diff saved to https://phabricator.wikimedia.org/P65444 and previous config saved to /var/cache/conftool/dbconfig/20240626-033933-marostegui.json
03:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P65443 and previous config saved to /var/cache/conftool/dbconfig/20240626-032426-marostegui.json
03:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P65442 and previous config saved to /var/cache/conftool/dbconfig/20240626-030919-marostegui.json
02:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T364069)', diff saved to https://phabricator.wikimedia.org/P65441 and previous config saved to /var/cache/conftool/dbconfig/20240626-025412-marostegui.json
00:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T367856)', diff saved to https://phabricator.wikimedia.org/P65440 and previous config saved to /var/cache/conftool/dbconfig/20240626-002103-marostegui.json
00:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
00:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
00:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367856)', diff saved to https://phabricator.wikimedia.org/P65439 and previous config saved to /var/cache/conftool/dbconfig/20240626-002041-marostegui.json
00:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65438 and previous config saved to /var/cache/conftool/dbconfig/20240626-000534-marostegui.json

2024-06-25

23:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65437 and previous config saved to /var/cache/conftool/dbconfig/20240625-235027-marostegui.json
23:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367856)', diff saved to https://phabricator.wikimedia.org/P65436 and previous config saved to /var/cache/conftool/dbconfig/20240625-233520-marostegui.json
23:27 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
23:05 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
23:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
22:44 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
22:43 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
22:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T364069)', diff saved to https://phabricator.wikimedia.org/P65435 and previous config saved to /var/cache/conftool/dbconfig/20240625-224249-marostegui.json
22:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
22:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
22:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T364069)', diff saved to https://phabricator.wikimedia.org/P65434 and previous config saved to /var/cache/conftool/dbconfig/20240625-224226-marostegui.json
22:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
22:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P65433 and previous config saved to /var/cache/conftool/dbconfig/20240625-222719-marostegui.json
22:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P65432 and previous config saved to /var/cache/conftool/dbconfig/20240625-221212-marostegui.json
22:10 bvibber: a webVideoTranscode job reported 'No space left on device' from a failed ffmpeg run on mw1446 recently
22:09 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
22:05 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
21:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T364069)', diff saved to https://phabricator.wikimedia.org/P65431 and previous config saved to /var/cache/conftool/dbconfig/20240625-215705-marostegui.json
21:47 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
20:44 cjming: end of UTC late backport window
20:41 cjming@deploy1002: Finished scap: Backport for Cleanup: Remove wgNavigationTimingSurveyName (T367128) (duration: 08m 29s)
20:36 cjming@deploy1002: jdlrobson, cjming: Continuing with sync
20:35 cjming@deploy1002: jdlrobson, cjming: Backport for Cleanup: Remove wgNavigationTimingSurveyName (T367128) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:32 cjming@deploy1002: Started scap: Backport for Cleanup: Remove wgNavigationTimingSurveyName (T367128)
20:31 cjming@deploy1002: Finished scap: Backport for Enable dark mode on more pages (T366378 T367374 T366373 T366520 T366373) (duration: 15m 04s)
20:26 cjming@deploy1002: jdlrobson, cjming: Continuing with sync
20:19 cjming@deploy1002: jdlrobson, cjming: Backport for Enable dark mode on more pages (T366378 T367374 T366373 T366520 T366373) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:16 cjming@deploy1002: Started scap: Backport for Enable dark mode on more pages (T366378 T367374 T366373 T366520 T366373)
20:14 cjming@deploy1002: Finished scap: Backport for Temporarily disable '4K' 2160p and mid 1440p transcodes (T368433) (duration: 08m 36s)
20:11 Emperor: restart swift-proxy on ms-fe2010 ms-fe1011 T360913
20:09 cjming@deploy1002: cjming, bvibber: Continuing with sync
20:08 cjming@deploy1002: cjming, bvibber: Backport for Temporarily disable '4K' 2160p and mid 1440p transcodes (T368433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:05 cjming@deploy1002: Started scap: Backport for Temporarily disable '4K' 2160p and mid 1440p transcodes (T368433)
20:03 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5017.eqsin.wmnet with OS bullseye
20:01 hashar@deploy1002: Finished deploy [integration/docroot@1eb5f4c]: remove CollaborationKit T368092 (duration: 00m 07s)
20:01 hashar@deploy1002: Started deploy [integration/docroot@1eb5f4c]: remove CollaborationKit T368092
19:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T367856)', diff saved to https://phabricator.wikimedia.org/P65430 and previous config saved to /var/cache/conftool/dbconfig/20240625-192947-marostegui.json
19:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
19:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
19:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
19:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
19:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367856)', diff saved to https://phabricator.wikimedia.org/P65429 and previous config saved to /var/cache/conftool/dbconfig/20240625-192910-marostegui.json
19:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
19:25 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
19:23 sukhe: re-enable puppet on lvs2011
19:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65428 and previous config saved to /var/cache/conftool/dbconfig/20240625-191403-marostegui.json
18:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65426 and previous config saved to /var/cache/conftool/dbconfig/20240625-185856-marostegui.json
18:49 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
18:49 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5017.eqsin.wmnet with OS bullseye
18:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367856)', diff saved to https://phabricator.wikimedia.org/P65425 and previous config saved to /var/cache/conftool/dbconfig/20240625-184349-marostegui.json
18:31 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
18:28 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
18:22 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
18:14 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.11 refs T366956
18:06 topranks: bringing up link from ssw1-a1-codfw to ssw1-d1-codfw T364095
17:57 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
17:55 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
17:51 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2004.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
17:44 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2004.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
17:43 brett: Re-re-pooling lvs2011 - T368165
17:37 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
17:36 brett: Depooling lvs2011 due to elevated socket/tcp errors - T368165
17:28 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
17:28 brett: Pooling lvs2011 - T368165
17:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T364069)', diff saved to https://phabricator.wikimedia.org/P65424 and previous config saved to /var/cache/conftool/dbconfig/20240625-172502-marostegui.json
17:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
17:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
17:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T364069)', diff saved to https://phabricator.wikimedia.org/P65423 and previous config saved to /var/cache/conftool/dbconfig/20240625-172440-marostegui.json
17:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P65422 and previous config saved to /var/cache/conftool/dbconfig/20240625-170933-marostegui.json
17:06 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
17:04 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
17:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
17:01 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
16:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P65421 and previous config saved to /var/cache/conftool/dbconfig/20240625-165426-marostegui.json
16:49 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
16:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
16:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T364069)', diff saved to https://phabricator.wikimedia.org/P65420 and previous config saved to /var/cache/conftool/dbconfig/20240625-163919-marostegui.json
16:37 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 100%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65419 and previous config saved to /var/cache/conftool/dbconfig/20240625-163330-arnaudb.json
16:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1437.eqiad.wmnet
16:31 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1437.eqiad.wmnet
16:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw1437.eqiad.wmnet with reason: Resizing disk
16:27 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on mw1437.eqiad.wmnet with reason: Resizing disk
16:23 bvibber: running requeueTranscodes for missing audio files on commons (mwmaint1002) cf T368364
16:23 claime: depooling mw1437
16:19 claime: cleaning up shellbox leftover files on mw1437.eqiad.wmnet
16:19 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
16:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 75%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65418 and previous config saved to /var/cache/conftool/dbconfig/20240625-161824-arnaudb.json
16:15 claime: Extending vg-srv on mw1437
16:10 brennen@deploy1002: Finished deploy [phabricator/deployment@72ad841]: deploy phab1004 for T368392 - followup T364728 (duration: 00m 39s)
16:10 brennen@deploy1002: Started deploy [phabricator/deployment@72ad841]: deploy phab1004 for T368392 - followup T364728
16:09 brennen@deploy1002: Finished deploy [phabricator/deployment@72ad841]: deploy phab2002 for T368392 - followup T364728 (duration: 00m 33s)
16:08 brennen@deploy1002: Started deploy [phabricator/deployment@72ad841]: deploy phab2002 for T368392 - followup T364728
16:05 brennen: silencing phabricator hosts prior to deploy
16:03 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 50%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65417 and previous config saved to /var/cache/conftool/dbconfig/20240625-160318-arnaudb.json
15:33 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
15:33 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs[1011-1021].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
15:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 10%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65415 and previous config saved to /var/cache/conftool/dbconfig/20240625-153307-arnaudb.json
15:31 Dreamy_Jazz: Ran `mwscript extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --wiki=testwiki` for T366781
15:22 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
15:21 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
15:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
15:20 claime: Deploying statsd to mw-api-ext - T365265
15:19 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
15:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 5%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65414 and previous config saved to /var/cache/conftool/dbconfig/20240625-151802-arnaudb.json
15:06 brennen@deploy1002: Finished deploy [phabricator/deployment@f58dd50]: deploy phab1004 for T368392 (duration: 00m 50s)
15:05 brennen@deploy1002: Started deploy [phabricator/deployment@f58dd50]: deploy phab1004 for T368392
15:05 brennen@deploy1002: Finished deploy [phabricator/deployment@f58dd50]: deploy phab2002 for T368392 (duration: 00m 33s)
15:04 brennen@deploy1002: Started deploy [phabricator/deployment@f58dd50]: deploy phab2002 for T368392
15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
15:00 topranks: rebooting lsw1-e5-eqiad to upgrade JunOS on switch T365986
14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on 7 hosts with reason: JunOS upgrade lsw1-e5-eqiad
14:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on 7 hosts with reason: JunOS upgrade lsw1-e5-eqiad
14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e5-eqiad,lsw1-e5-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e5-eqiad
14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e5-eqiad,lsw1-e5-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e5-eqiad
14:56 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
14:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on es1035.eqiad.wmnet with reason: T365986
14:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:45:00 on es1035.eqiad.wmnet with reason: T365986
14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'T365986 - depool es1035', diff saved to https://phabricator.wikimedia.org/P65413 and previous config saved to /var/cache/conftool/dbconfig/20240625-145558-arnaudb.json
14:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e5-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e5-eqiad
14:55 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
14:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e5-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e5-eqiad
14:50 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
14:49 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
14:45 urbanecm@deploy1002: Finished scap: Backport for WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275) (duration: 11m 45s)
14:40 urbanecm@deploy1002: urbanecm: Continuing with sync
14:40 urbanecm@deploy1002: urbanecm: Backport for WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:36 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbproxy2005 to codfw - jhancock@cumin2002"
14:35 sukhe: sudo cumin -b1 -s900 "A:dnsbox" "run-puppet-agent --enable 'rolling out CR 1049165' && systemctl restart ntp.service"
14:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbproxy2005 to codfw - jhancock@cumin2002"
14:33 vgutierrez: rolling upgrade of fifo-log-demux on A:cp-eqiad - T364383
14:33 urbanecm@deploy1002: Started scap: Backport for WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275)
14:30 vgutierrez: disable puppet on A:cp-eqiad before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049570 - T364383
14:24 dcausse: re-indexing all wikidata entity schemas (T368010)
14:23 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:17 urbanecm@deploy1002: Finished scap: Backport for Add change tag "Community Configuration" (T366989), Add change tag "Community Configuration" (T366989), WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275) (duration: 58m 28s)
14:15 sukhe: sudo cumin "A:dnsbox" 'disable-puppet "rolling out CR 1049165"'
14:12 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs[1011-1021].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
14:11 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
14:10 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
14:09 urbanecm@deploy1002: urbanecm: Continuing with sync
14:05 sukhe: restart pybal on lvs2014
14:02 eevans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
14:02 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
14:01 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
14:01 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
14:00 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs
14:00 urbanecm@deploy1002: urbanecm: Backport for Add change tag "Community Configuration" (T366989), Add change tag "Community Configuration" (T366989), WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:59 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
13:59 eevans@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
13:54 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs
13:51 sukhe: restart pybal on lvs1020
13:44 sukhe: disable puppet on A:lvs and A:codfw for CR 1049560
13:43 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:43 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt forkrb1002 - jclark@cumin1002"
13:42 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt forkrb1002 - jclark@cumin1002"
13:39 jclark@cumin1002: START - Cookbook sre.dns.netbox
13:37 mvernon@cumin2002: conftool action : set/pooled=yes; selector: cluster=apus
13:36 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
13:36 eevans@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
13:29 vgutierrez: IPIP encapsulation enabled on ldap-ro.eqiad.wikimedia.org - T367861
13:26 vgutierrez: rolling restart of pybal on lvs1020 and lvs1018 - T367861
13:18 urbanecm@deploy1002: Started scap: Backport for Add change tag "Community Configuration" (T366989), Add change tag "Community Configuration" (T366989), WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275)
13:07 fabfur: temporary disabled puppet on cp4037 to test benthos configuration (T367756)
12:51 cgoubert@deploy1002: Finished scap: Deploy udp2log rate-limiting - T365655 - T368098 (duration: 05m 49s)
12:46 cgoubert@deploy1002: Started scap: Deploy udp2log rate-limiting - T365655 - T368098
12:44 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
12:44 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
12:42 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
12:42 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
12:12 XioNoX: push NTP changes on pfw3
12:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T364069)', diff saved to https://phabricator.wikimedia.org/P65411 and previous config saved to /var/cache/conftool/dbconfig/20240625-120926-marostegui.json
12:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
12:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
11:58 vgutierrez: rolling upgrade of fifo-log-demux on A:cp-esams - T364383
11:56 vgutierrez: disable puppet on A:cp-esams before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049529 - T364383
11:55 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:53 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:45 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:45 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:40 marostegui: m2 dbmaint eqiad Stop db1217:3322 to clone db1228 T368374
10:12 jmm@deploy1002: Finished scap: (no justification provided) (duration: 03m 30s)
10:11 jmm@deploy1002: Started scap: (no justification provided)
09:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on 11 hosts with reason: Turning down appserver clusters
09:53 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on 11 hosts with reason: Turning down appserver clusters
09:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on 25 hosts with reason: Turning down appserver clusters
09:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on 25 hosts with reason: Turning down appserver clusters
09:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db[1217,1228].eqiad.wmnet with reason: Cloning
09:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db[1217,1228].eqiad.wmnet with reason: Cloning
09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1228 from dbctl T368374', diff saved to https://phabricator.wikimedia.org/P65409 and previous config saved to /var/cache/conftool/dbconfig/20240625-093454-marostegui.json
09:34 slyngs: Switching idp-test.wikimedia.org to CAS 7
09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1228 T368374', diff saved to https://phabricator.wikimedia.org/P65408 and previous config saved to /var/cache/conftool/dbconfig/20240625-093221-root.json
08:45 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: full dump
08:45 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: full dump
08:32 jynus@cumin1002: dbctl commit (dc=all): 'Depool es2025', diff saved to https://phabricator.wikimedia.org/P65407 and previous config saved to /var/cache/conftool/dbconfig/20240625-083216-jynus.json
08:31 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2022.codfw.wmnet with reason: full dump
08:31 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2022.codfw.wmnet with reason: full dump
08:26 jynus@cumin1002: dbctl commit (dc=all): 'Depool es2022', diff saved to https://phabricator.wikimedia.org/P65406 and previous config saved to /var/cache/conftool/dbconfig/20240625-082649-jynus.json
07:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
07:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
07:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
07:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
07:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65405 and previous config saved to /var/cache/conftool/dbconfig/20240625-071855-marostegui.json
07:14 marostegui: Optimize pagelinks on old s8 codfw master db2165 dbmaint T364069
07:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Long schema change
07:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Long schema change
07:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P65404 and previous config saved to /var/cache/conftool/dbconfig/20240625-070348-marostegui.json
07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2165 T368355', diff saved to https://phabricator.wikimedia.org/P65403 and previous config saved to /var/cache/conftool/dbconfig/20240625-070252-marostegui.json
07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2161 to s8 primary T368355', diff saved to https://phabricator.wikimedia.org/P65402 and previous config saved to /var/cache/conftool/dbconfig/20240625-070127-marostegui.json
07:01 marostegui: Starting s8 codfw failover from db2165 to db2161 - T368355
07:00 arnaudb@deploy1002: Finished scap: Backport for Revert "dbconfig: temporary disable writes on es7" (duration: 07m 47s)
06:55 arnaudb@deploy1002: arnaudb: Continuing with sync
06:55 arnaudb@deploy1002: arnaudb: Backport for Revert "dbconfig: temporary disable writes on es7" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:54 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
06:52 arnaudb@deploy1002: Started scap: Backport for Revert "dbconfig: temporary disable writes on es7"
06:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P65401 and previous config saved to /var/cache/conftool/dbconfig/20240625-064841-marostegui.json
06:45 arnaudb@deploy1002: Sync cancelled.
06:45 arnaudb@deploy1002: arnaudb: Backport for Revert "dbconfig: temporary disable writes on es7" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:42 arnaudb@deploy1002: Started scap: Backport for Revert "dbconfig: temporary disable writes on es7"
06:40 arnaudb@cumin1002: dbctl commit (dc=all): 'T368020', diff saved to https://phabricator.wikimedia.org/P65400 and previous config saved to /var/cache/conftool/dbconfig/20240625-064000-arnaudb.json
06:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s8 T368355
06:39 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2161 with weight 0 T368355', diff saved to https://phabricator.wikimedia.org/P65399 and previous config saved to /var/cache/conftool/dbconfig/20240625-063908-root.json
06:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s8 T368355
06:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote es1039 to es7 primary T368020', diff saved to https://phabricator.wikimedia.org/P65398 and previous config saved to /var/cache/conftool/dbconfig/20240625-063453-arnaudb.json
06:33 arnaudb: Starting es7 eqiad failover from es1035 to es1039 - T368020
06:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65397 and previous config saved to /var/cache/conftool/dbconfig/20240625-063334-marostegui.json
06:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Set es1039 with weight 0 T368020', diff saved to https://phabricator.wikimedia.org/P65396 and previous config saved to /var/cache/conftool/dbconfig/20240625-062640-arnaudb.json
06:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es7 T368020
06:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es7 T368020
06:24 arnaudb@deploy1002: Finished scap: Backport for dbconfig: temporary disable writes on es7 (T368020) (duration: 18m 47s)
06:19 arnaudb@deploy1002: arnaudb: Continuing with sync
06:17 arnaudb@deploy1002: arnaudb: Backport for dbconfig: temporary disable writes on es7 (T368020) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:11 marostegui: Drop ipblocks from s7 T367632
06:05 arnaudb@deploy1002: Started scap: Backport for dbconfig: temporary disable writes on es7 (T368020)
06:02 marostegui: Drop ipblocks from s6 T367632
05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65395 and previous config saved to /var/cache/conftool/dbconfig/20240625-053312-marostegui.json
05:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
05:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
05:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
05:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
05:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T367856)', diff saved to https://phabricator.wikimedia.org/P65394 and previous config saved to /var/cache/conftool/dbconfig/20240625-053239-marostegui.json
05:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
05:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.8 (duration: 00m 55s)
03:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.11 refs T366956 (duration: 52m 19s)
03:03 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.11 refs T366956
01:48 brett: Running authdns-update on dns1004 to pool eqsin - T365763
01:43 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=cache_text,dc=eqsin
01:40 brett: Removing downtime for cp[5017-5024] as nvme drives are installed and hosts back online - T365763
00:43 sukhe: [correction of command] sudo pkill ffmpeg: mw1438, high CPU usage, ffmpeg processes
00:43 sukhe: sudo pkill mpeg: mw1438, high CPU usage, ffmpeg processes
00:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: T365763
00:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on 8 hosts with reason: T365763

2024-06-24

23:02 brett: Running authdns-update on dns1004 to depool eqsin - T365763
23:00 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2003.codfw.wmnet
23:00 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:00 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
22:57 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
22:53 cwhite@cumin2002: START - Cookbook sre.dns.netbox
22:46 cwhite@cumin2002: START - Cookbook sre.hosts.decommission for hosts logstash2003.codfw.wmnet
22:46 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2002.codfw.wmnet
22:46 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:46 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
22:41 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
22:38 cwhite@cumin2002: START - Cookbook sre.dns.netbox
22:26 cwhite@cumin2002: START - Cookbook sre.hosts.decommission for hosts logstash2002.codfw.wmnet
22:26 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2001.codfw.wmnet
22:26 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:26 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
22:24 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
22:18 cwhite@cumin2002: START - Cookbook sre.dns.netbox
22:11 cwhite@cumin2002: START - Cookbook sre.hosts.decommission for hosts logstash2001.codfw.wmnet
21:34 inflatador: bking@alerts1001 uninstall deb pkg `ripgrep` T368107
21:03 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-redacteddb1001.eqiad.wmnet
21:00 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
20:53 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet
20:36 inflatador: bking@alert1001 install `ripgrep` deb pkg T368107
20:22 ladsgroup@deploy1002: Synchronized php-1.43.0-wmf.10/includes/libs/rdbms/loadbalancer/LoadBalancer.php: (no justification provided) (duration: 11m 04s)
20:21 mutante: snapsho1017 - systemctl mask commonsrdf-dump ; systemctl mask commonsjson-dump T368098
20:18 taavi: taavi@snapshot1017 ~ $ sudo systemctl stop commons*.service
20:01 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1056.eqiad.wmnet with OS bookworm
19:35 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
19:32 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
19:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:08 mutante: LDAP - added daphnesmit to group 'wmf' - Phabricator: added dsmit-wmf to WMF-NDA group T368140
19:02 sukhe: ms-fe1009: restart swift-proxy: T360913
18:59 mutante: ms-fe1011 - restarted swift-proxy
18:53 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
18:52 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 15 hosts
18:52 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for 15 hosts
18:50 eevans@cumin1002: END (ERROR) - Cookbook sre.cassandra.roll-restart (exit_code=97) for nodes matching A:restbase-eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
18:50 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
18:50 sukhe: sudo cumin -s1 -b60 'ms-fe1010*,ms-fe1013*' 'systemctl restart swift-proxy'
18:50 mutante: ms-fe1010,ms-fe1013 - restart swift-proxy - T360913
18:48 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: Rotate ChronologyProtector secret (duration: 11m 33s)
18:46 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "d"} and A:restbase and A:eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
18:43 ladsgroup@deploy1002: ladsgroup: Continuing with sync
18:41 ladsgroup@deploy1002: ladsgroup: Rotate ChronologyProtector secret synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:17 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1055.eqiad.wmnet with OS bookworm
18:16 mutante: ms-fe1012:~] $ sudo systemctl restart swift-proxy T360913
18:16 mutante: ms-fe1012:~] $ sudo systemctl restart swift-proxy T360931
18:07 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
18:06 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
18:06 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
18:05 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
18:04 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1020.eqiad.wmnet
18:04 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs1020.eqiad.wmnet
18:03 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "d"} and A:restbase and A:eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
18:02 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "b"} and A:restbase and A:eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
17:57 sukhe: restart on pybal lvs1019
17:56 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
17:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=apus,dc=eqiad
17:50 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
17:50 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
17:49 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
17:48 sbassett@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
17:48 sbassett@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
17:48 sbassett@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
17:48 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs
17:47 sbassett@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
17:47 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
17:47 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs
17:47 sbassett@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
17:47 sbassett@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
17:46 sbassett@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
17:46 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
17:46 sbassett@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
17:46 sbassett@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
17:46 sbassett@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
17:45 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
17:44 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
17:44 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
17:43 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
17:34 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=apus,dc=codfw
17:33 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
17:32 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
17:28 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1055.eqiad.wmnet with OS bookworm
17:28 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
17:27 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
17:23 sukhe: restart pybal on lvs2013
17:20 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
17:19 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
17:18 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "b"} and A:restbase and A:eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
17:13 sukhe: restart pybal on lvs1020 and lvs1019
17:09 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
17:08 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
16:55 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
16:51 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
16:49 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
16:48 sukhe: restart pybal on lvs1020
16:47 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
16:44 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
16:41 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
16:33 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase[1031,1034-1036].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
16:27 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1054.eqiad.wmnet with OS bookworm
16:20 sukhe: restart pybal on lvs1020
16:01 dancy@deploy1002: Installation of scap version "4.89.0" completed for 1 hosts
16:00 dancy@deploy1002: Installing scap version "4.89.0" for 1 hosts
15:59 sukhe: restart pybal on lvs1020
15:59 dancy@deploy1002: Installing scap version "4.89.0" for 248 hosts
15:57 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase[1031,1034-1036].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
15:50 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs1010.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
15:49 sukhe: restart pybal on lvs1020
15:43 vgutierrez: updated termination_state cache haproxy metrics, expect higher CD and CR rates - T367963
15:42 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs1010.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
15:29 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
15:29 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
15:20 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: sync
15:20 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: sync
15:17 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
15:16 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: sync
15:16 elukey@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: sync
15:15 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
15:15 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
15:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
15:11 mvernon@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs (T279621)
15:11 claime: Enabling statsd-exporter on mw-jobrunner - T365265
15:11 mvernon@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs (T279621)
15:09 vgutierrez: rolling upgrade of fifo-log-demux on A:cp-drmrs - T364383
15:08 Emperor: enable/run puppet on eqiad lvs for apus LVS rollout T279621
15:08 Dreamy_Jazz: Afternoon UTC backport window done
15:08 dreamyjazz@deploy1002: Finished scap: Backport for extension-list: Add IPReputation (T360067) (duration: 30m 37s)
15:07 vgutierrez: [fixed url] disable puppet on A:cp-drmrs before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049198 - T364383
15:06 vgutierrez: disable puppet on A:cp-drmrs before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049104 - T364383
15:02 sukhe: restart pybal on lvs2014
15:01 mvernon@cumin1002: END (ERROR) - Cookbook sre.loadbalancer.restart-pybal (exit_code=97) rolling-restart of pybal on A:lvs-secondary-codfw or A:lvs-low-traffic-codfw and A:lvs (T279621)
15:00 dreamyjazz@deploy1002: kharlan, dreamyjazz: Continuing with sync
14:57 dreamyjazz@deploy1002: kharlan, dreamyjazz: Backport for extension-list: Add IPReputation (T360067) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:56 mvernon@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw or A:lvs-low-traffic-codfw and A:lvs (T279621)
14:53 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
14:52 Emperor: enable/run puppet on codfw lvs for apus LVS rollout T279621
14:49 Emperor: stop puppet on eqiad/codfw lvs prior to apus LVS rollout T279621
14:48 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on cp4052.ulsfo.wmnet with reason: Upgrade glibc
14:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on cp4052.ulsfo.wmnet with reason: Upgrade glibc
14:47 elukey: depool cp4052 to deploy a new version of glibc - T367978
14:47 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet
14:37 dreamyjazz@deploy1002: Started scap: Backport for extension-list: Add IPReputation (T360067)
14:34 urbanecm@deploy1002: Finished scap: Backport for ptwiki: Undeploy CommunityConfiguration (T368121) (duration: 07m 31s)
14:32 mnz@deploy1002: Finished deploy [airflow-dags/research@b682892]: (no justification provided) (duration: 00m 31s)
14:31 mnz@deploy1002: Started deploy [airflow-dags/research@b682892]: (no justification provided)
14:27 urbanecm@deploy1002: Started scap: Backport for ptwiki: Undeploy CommunityConfiguration (T368121)
14:27 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
14:17 mvernon@cumin1002: conftool action : set/pooled=yes:weight=40; selector: cluster=apus
14:13 urbanecm@deploy1002: Finished scap: Backport for Growth: Enable CommunityConfiguration at idwiki (T366629), Growth: Enable CommunityConfiguration on round 1 wikis (T368121), AX Language selector entrypoint: Fix AX URL (T363183) (duration: 25m 37s)
14:07 urbanecm@deploy1002: kartik, urbanecm: Continuing with sync
14:03 sukhe: running homer in cr*{eqiad*,codfw*} to remove ntp.anycast.wmnet from policies/cr-labs: T366360
13:57 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
13:56 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
13:49 urbanecm@deploy1002: kartik, urbanecm: Backport for Growth: Enable CommunityConfiguration at idwiki (T366629), Growth: Enable CommunityConfiguration on round 1 wikis (T368121), AX Language selector entrypoint: Fix AX URL (T363183) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:47 urbanecm@deploy1002: Started scap: Backport for Growth: Enable CommunityConfiguration at idwiki (T366629), Growth: Enable CommunityConfiguration on round 1 wikis (T368121), AX Language selector entrypoint: Fix AX URL (T363183)
13:47 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
13:44 urbanecm@deploy1002: Finished scap: Backport for mediawiki.org: Sync xml/export-*.xsd files with MW core (T343622), CommonSettings: Restore the original behaviour of Reference Previews (T366419), [MediaModeration] Update 'From' email address to wiki@wikimedia.org (T368258) (duration: 35m 30s)
13:41 vgutierrez: disable puppet on A:cp-magru before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049178 - T364383
13:40 vgutierrez: rolling upgrade of fifo-log-demux on A:cp-magru - T364383
13:40 elukey: [correction] depool cp4037 to deploy a new version of glibc - T367978
13:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on cp4037.ulsfo.wmnet with reason: Upgrade glibc
13:39 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on cp4037.ulsfo.wmnet with reason: Upgrade glibc
13:39 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
13:37 urbanecm@deploy1002: urbanecm, func, dreamyjazz: Continuing with sync
13:35 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
13:35 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
13:32 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4034.ulsfo.wmnet
13:31 elukey: depool cp4034 to deploy a new version of glibc - T367978
13:29 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
13:29 elukey: uploaded debmonitor-client_0.4.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia
13:29 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
13:24 urbanecm@deploy1002: urbanecm, func, dreamyjazz: Backport for mediawiki.org: Sync xml/export-*.xsd files with MW core (T343622), CommonSettings: Restore the original behaviour of Reference Previews (T366419), [MediaModeration] Update 'From' email address to wiki@wikimedia.org (T368258) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:22 mnz@deploy1002: Finished deploy [airflow-dags/research@1996a7a]: (no justification provided) (duration: 00m 32s)
13:22 mnz@deploy1002: Started deploy [airflow-dags/research@1996a7a]: (no justification provided)
13:08 urbanecm@deploy1002: Started scap: Backport for mediawiki.org: Sync xml/export-*.xsd files with MW core (T343622), CommonSettings: Restore the original behaviour of Reference Previews (T366419), [MediaModeration] Update 'From' email address to wiki@wikimedia.org (T368258)
12:53 vgutierrez: IPIP encapsulation enabled on ldap-ro.codfw.wikimedia.org. - T367861
12:50 vgutierrez: rolling restart of pybal on lvs2014 and lvs2012 - T367861
12:23 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
12:21 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
12:17 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
12:16 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,name=mw1364.eqiad.wmnet
12:16 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: dc=codfw,cluster=appserver,name=mw2276.codfw.wmnet
12:15 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=api_appserver,name=mw1398.eqiad.wmnet
12:15 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: dc=codfw,cluster=api_appserver,name=mw2299.codfw.wmnet
12:13 moritzm: installing pymysql security updates
12:11 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
12:10 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
12:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mwdebug[2001-2002].codfw.wmnet,mwdebug[1001-1002].eqiad.wmnet
12:10 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mwdebug[2001-2002].codfw.wmnet,mwdebug[1001-1002].eqiad.wmnet
12:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on 31 hosts with reason: Waiting for reimage to kubernetes
12:10 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
12:09 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on 31 hosts with reason: Waiting for reimage to kubernetes
12:09 claime: Downtiming all legacy api_appserver and appserver - T368058
12:07 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: cluster=api_appserver
12:07 claime: Setting all legacy api_appservers to inactive - T368058
12:07 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: cluster=appserver
12:06 claime: Setting all legacy appservers to inactive - T368058
12:05 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
12:04 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
12:01 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
11:59 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
11:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
11:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
11:55 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
11:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2360.codfw.wmnet
11:52 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2360.codfw.wmnet
11:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2358.codfw.wmnet
11:52 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2358.codfw.wmnet
11:52 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
11:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2339.codfw.wmnet
11:52 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2339.codfw.wmnet
11:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw1406.eqiad.wmnet
11:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw1406.eqiad.wmnet
11:51 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
11:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw1403.eqiad.wmnet
11:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw1403.eqiad.wmnet
11:51 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
11:50 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
11:49 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
11:49 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
11:46 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
11:45 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
11:44 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/push-notifications: apply
11:44 moritzm: installing php8.2 security updates
11:42 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
11:26 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
11:13 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:13 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove AAAA records from an-redacteddb1001 - btullis@cumin1002"
11:01 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove AAAA records from an-redacteddb1001 - btullis@cumin1002"
10:58 btullis@cumin1002: START - Cookbook sre.dns.netbox
10:51 cgoubert@cumin1002: conftool action : set/pooled=yes:weight=10; selector: name=(mw1420.eqiad.wmnet|mw1407.eqiad.wmnet),dc=eqiad,cluster=jobrunner
10:50 cgoubert@cumin1002: conftool action : set/pooled=no:weight=10; selector: name=(mw1420.eqiad.wmnet|mw1407.eqiad.wmnet),dc=eqiad,cluster=jobrunner
10:41 cgoubert@cumin1002: conftool action : set/pooled=no:weight=10; selector: name=(mw1420.eqiad.wmnet|mw1407.eqiad.wmnet),dc=eqiad,cluster=videoscaler
10:41 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/push-notifications: apply
10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1420.eqiad.wmnet
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1420.eqiad.wmnet
10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1407.eqiad.wmnet
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1407.eqiad.wmnet
10:39 claime: pooling mw1420.eqiad.wmnet,mw1407.eqiad.wmnet as videoscalers - T368058
10:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1407.eqiad.wmnet with OS buster
10:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1420.eqiad.wmnet with OS buster
10:14 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
10:10 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
10:04 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
10:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1407.eqiad.wmnet with reason: host reimage
09:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1420.eqiad.wmnet with reason: host reimage
09:57 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
09:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1407.eqiad.wmnet with reason: host reimage
09:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1420.eqiad.wmnet with reason: host reimage
09:46 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
09:45 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
09:44 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
09:44 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
09:44 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
09:44 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
09:43 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
09:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1420.eqiad.wmnet with OS buster
09:42 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
09:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
09:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1407.eqiad.wmnet with OS buster
09:39 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
09:34 claime: Reimaging scap::proxies, mediawiki deployments may be unavailable - T368058
09:33 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
09:33 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
09:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:20 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for checker.tools.wmflabs.org
09:20 taavi@cumin1002: START - Cookbook sre.hosts.remove-downtime for checker.tools.wmflabs.org
09:19 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:17 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on checker.tools.wmflabs.org with reason: rebooting the toolschecker VM
09:17 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on checker.tools.wmflabs.org with reason: rebooting the toolschecker VM
09:10 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:10 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
05:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
05:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367856)', diff saved to https://phabricator.wikimedia.org/P65388 and previous config saved to /var/cache/conftool/dbconfig/20240624-050309-marostegui.json
04:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P65387 and previous config saved to /var/cache/conftool/dbconfig/20240624-044802-marostegui.json
04:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P65386 and previous config saved to /var/cache/conftool/dbconfig/20240624-043254-marostegui.json
04:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367856)', diff saved to https://phabricator.wikimedia.org/P65385 and previous config saved to /var/cache/conftool/dbconfig/20240624-041747-marostegui.json
01:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T367856)', diff saved to https://phabricator.wikimedia.org/P65384 and previous config saved to /var/cache/conftool/dbconfig/20240624-015859-marostegui.json
01:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
01:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
01:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367856)', diff saved to https://phabricator.wikimedia.org/P65383 and previous config saved to /var/cache/conftool/dbconfig/20240624-015836-marostegui.json
01:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P65382 and previous config saved to /var/cache/conftool/dbconfig/20240624-014329-marostegui.json
01:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P65381 and previous config saved to /var/cache/conftool/dbconfig/20240624-012822-marostegui.json
01:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367856)', diff saved to https://phabricator.wikimedia.org/P65380 and previous config saved to /var/cache/conftool/dbconfig/20240624-011315-marostegui.json

2024-06-23

22:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T367856)', diff saved to https://phabricator.wikimedia.org/P65379 and previous config saved to /var/cache/conftool/dbconfig/20240623-225008-marostegui.json
22:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
22:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
22:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367856)', diff saved to https://phabricator.wikimedia.org/P65378 and previous config saved to /var/cache/conftool/dbconfig/20240623-224946-marostegui.json
22:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P65377 and previous config saved to /var/cache/conftool/dbconfig/20240623-223439-marostegui.json
22:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P65376 and previous config saved to /var/cache/conftool/dbconfig/20240623-221932-marostegui.json
22:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367856)', diff saved to https://phabricator.wikimedia.org/P65375 and previous config saved to /var/cache/conftool/dbconfig/20240623-220426-marostegui.json
19:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T367856)', diff saved to https://phabricator.wikimedia.org/P65374 and previous config saved to /var/cache/conftool/dbconfig/20240623-193306-marostegui.json
19:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
19:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
19:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65373 and previous config saved to /var/cache/conftool/dbconfig/20240623-193244-marostegui.json
19:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P65372 and previous config saved to /var/cache/conftool/dbconfig/20240623-191737-marostegui.json
19:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P65371 and previous config saved to /var/cache/conftool/dbconfig/20240623-190230-marostegui.json
18:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65370 and previous config saved to /var/cache/conftool/dbconfig/20240623-184722-marostegui.json
16:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65369 and previous config saved to /var/cache/conftool/dbconfig/20240623-161243-marostegui.json
16:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
16:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
16:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T367856)', diff saved to https://phabricator.wikimedia.org/P65368 and previous config saved to /var/cache/conftool/dbconfig/20240623-161221-marostegui.json
15:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P65367 and previous config saved to /var/cache/conftool/dbconfig/20240623-155714-marostegui.json
15:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P65366 and previous config saved to /var/cache/conftool/dbconfig/20240623-154207-marostegui.json
15:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T367856)', diff saved to https://phabricator.wikimedia.org/P65365 and previous config saved to /var/cache/conftool/dbconfig/20240623-152700-marostegui.json
12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T367856)', diff saved to https://phabricator.wikimedia.org/P65364 and previous config saved to /var/cache/conftool/dbconfig/20240623-124522-marostegui.json
12:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
12:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367856)', diff saved to https://phabricator.wikimedia.org/P65363 and previous config saved to /var/cache/conftool/dbconfig/20240623-124459-marostegui.json
12:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P65362 and previous config saved to /var/cache/conftool/dbconfig/20240623-122952-marostegui.json
12:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P65361 and previous config saved to /var/cache/conftool/dbconfig/20240623-121445-marostegui.json
11:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367856)', diff saved to https://phabricator.wikimedia.org/P65360 and previous config saved to /var/cache/conftool/dbconfig/20240623-115938-marostegui.json
11:06 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: sync
11:06 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: sync
09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T367856)', diff saved to https://phabricator.wikimedia.org/P65359 and previous config saved to /var/cache/conftool/dbconfig/20240623-092833-marostegui.json
09:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
09:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367856)', diff saved to https://phabricator.wikimedia.org/P65358 and previous config saved to /var/cache/conftool/dbconfig/20240623-092811-marostegui.json
09:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P65357 and previous config saved to /var/cache/conftool/dbconfig/20240623-091304-marostegui.json
08:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P65356 and previous config saved to /var/cache/conftool/dbconfig/20240623-085757-marostegui.json
08:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367856)', diff saved to https://phabricator.wikimedia.org/P65355 and previous config saved to /var/cache/conftool/dbconfig/20240623-084250-marostegui.json
06:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T367856)', diff saved to https://phabricator.wikimedia.org/P65354 and previous config saved to /var/cache/conftool/dbconfig/20240623-060520-marostegui.json
06:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
06:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
06:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
06:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance

2024-06-22

20:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
20:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
16:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
16:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
16:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65353 and previous config saved to /var/cache/conftool/dbconfig/20240622-161841-marostegui.json
16:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P65352 and previous config saved to /var/cache/conftool/dbconfig/20240622-160333-marostegui.json
15:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P65351 and previous config saved to /var/cache/conftool/dbconfig/20240622-154826-marostegui.json
15:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65350 and previous config saved to /var/cache/conftool/dbconfig/20240622-153318-marostegui.json
12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65349 and previous config saved to /var/cache/conftool/dbconfig/20240622-120437-marostegui.json
12:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
12:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65348 and previous config saved to /var/cache/conftool/dbconfig/20240622-120404-marostegui.json
11:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P65347 and previous config saved to /var/cache/conftool/dbconfig/20240622-114857-marostegui.json
11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P65346 and previous config saved to /var/cache/conftool/dbconfig/20240622-113350-marostegui.json
11:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65345 and previous config saved to /var/cache/conftool/dbconfig/20240622-111842-marostegui.json
06:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65344 and previous config saved to /var/cache/conftool/dbconfig/20240622-064802-marostegui.json
06:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
06:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65343 and previous config saved to /var/cache/conftool/dbconfig/20240622-064739-marostegui.json
06:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P65342 and previous config saved to /var/cache/conftool/dbconfig/20240622-063232-marostegui.json
06:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P65341 and previous config saved to /var/cache/conftool/dbconfig/20240622-061725-marostegui.json
06:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65340 and previous config saved to /var/cache/conftool/dbconfig/20240622-060216-marostegui.json
05:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2197.codfw.wmnet with reason: Long schema change
05:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2197.codfw.wmnet with reason: Long schema change
01:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65339 and previous config saved to /var/cache/conftool/dbconfig/20240622-015020-marostegui.json
01:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
01:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
01:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T364069)', diff saved to https://phabricator.wikimedia.org/P65338 and previous config saved to /var/cache/conftool/dbconfig/20240622-014958-marostegui.json
01:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P65337 and previous config saved to /var/cache/conftool/dbconfig/20240622-013451-marostegui.json
01:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P65336 and previous config saved to /var/cache/conftool/dbconfig/20240622-011943-marostegui.json
01:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T364069)', diff saved to https://phabricator.wikimedia.org/P65335 and previous config saved to /var/cache/conftool/dbconfig/20240622-010436-marostegui.json

2024-06-21

23:54 cwhite: delete remaining 2024.03 log indexes to make room on logstash eqiad and codfw T368180
23:43 brett@puppetmaster1001: dbctl commit (dc=all): 'set db1206 s1 weight to 1 - T368098', diff saved to https://phabricator.wikimedia.org/P65334 and previous config saved to /var/cache/conftool/dbconfig/20240621-234328-brett.json
23:28 brett: # dbctl instance db1206 set-weight 10 --section s1
21:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367856)', diff saved to https://phabricator.wikimedia.org/P65333 and previous config saved to /var/cache/conftool/dbconfig/20240621-213503-marostegui.json
21:31 cwhite: restart apache2 on gerrit1003
21:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P65332 and previous config saved to /var/cache/conftool/dbconfig/20240621-211956-marostegui.json
21:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P65331 and previous config saved to /var/cache/conftool/dbconfig/20240621-210448-marostegui.json
20:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367856)', diff saved to https://phabricator.wikimedia.org/P65330 and previous config saved to /var/cache/conftool/dbconfig/20240621-204941-marostegui.json
20:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T364069)', diff saved to https://phabricator.wikimedia.org/P65329 and previous config saved to /var/cache/conftool/dbconfig/20240621-203659-marostegui.json
20:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
20:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
20:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T364069)', diff saved to https://phabricator.wikimedia.org/P65328 and previous config saved to /var/cache/conftool/dbconfig/20240621-203636-marostegui.json
20:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P65327 and previous config saved to /var/cache/conftool/dbconfig/20240621-202129-marostegui.json
20:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P65326 and previous config saved to /var/cache/conftool/dbconfig/20240621-200622-marostegui.json
19:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T364069)', diff saved to https://phabricator.wikimedia.org/P65325 and previous config saved to /var/cache/conftool/dbconfig/20240621-195115-marostegui.json
19:43 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
16:00 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:59 cmooney@cumin1002: START - Cookbook sre.dns.netbox
15:41 elukey@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
15:41 elukey@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
15:40 elukey@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
15:39 elukey@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
15:37 elukey@cumin1002: END (FAIL) - Cookbook sre.netbox.update-extras (exit_code=1) rolling restart_daemons on A:netbox-canary
15:37 elukey@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
15:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T364069)', diff saved to https://phabricator.wikimedia.org/P65322 and previous config saved to /var/cache/conftool/dbconfig/20240621-152038-marostegui.json
15:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
15:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
15:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
15:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
15:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T364069)', diff saved to https://phabricator.wikimedia.org/P65321 and previous config saved to /var/cache/conftool/dbconfig/20240621-152011-marostegui.json
15:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P65319 and previous config saved to /var/cache/conftool/dbconfig/20240621-150504-marostegui.json
15:01 ejegg: fundraising civicrm upgraded from 8a0b5bea to 13a13f3a
14:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P65318 and previous config saved to /var/cache/conftool/dbconfig/20240621-144957-marostegui.json
14:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T364069)', diff saved to https://phabricator.wikimedia.org/P65317 and previous config saved to /var/cache/conftool/dbconfig/20240621-143450-marostegui.json
14:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T367856)', diff saved to https://phabricator.wikimedia.org/P65314 and previous config saved to /var/cache/conftool/dbconfig/20240621-141050-marostegui.json
14:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
14:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
14:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T367856)', diff saved to https://phabricator.wikimedia.org/P65313 and previous config saved to /var/cache/conftool/dbconfig/20240621-141028-marostegui.json
13:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P65312 and previous config saved to /var/cache/conftool/dbconfig/20240621-135521-marostegui.json
13:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P65309 and previous config saved to /var/cache/conftool/dbconfig/20240621-134013-marostegui.json
13:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T367856)', diff saved to https://phabricator.wikimedia.org/P65306 and previous config saved to /var/cache/conftool/dbconfig/20240621-132506-marostegui.json
13:21 btullis@deploy1002: Finished deploy [performance/asoranking@febfb9f]: (no justification provided) (duration: 00m 04s)
13:21 btullis@deploy1002: Started deploy [performance/asoranking@febfb9f]: (no justification provided)
13:08 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
13:07 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
13:07 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
13:07 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
13:07 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
13:06 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
11:37 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) shellbox-video.discovery.wmnet on all recursors
11:37 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache shellbox-video.discovery.wmnet on all recursors
11:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T367856)', diff saved to https://phabricator.wikimedia.org/P65303 and previous config saved to /var/cache/conftool/dbconfig/20240621-110638-marostegui.json
11:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
11:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
10:57 Emperor: restart swift-proxy on ms-fe2011 ms-fe2012 T360913
10:56 Emperor: restart swift-proxy on ms-fe1010 T360913
10:36 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2002.codfw.wmnet
10:36 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2001.codfw.wmnet
10:28 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
10:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T364069)', diff saved to https://phabricator.wikimedia.org/P65302 and previous config saved to /var/cache/conftool/dbconfig/20240621-100554-marostegui.json
10:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
10:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
10:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T364069)', diff saved to https://phabricator.wikimedia.org/P65301 and previous config saved to /var/cache/conftool/dbconfig/20240621-100531-marostegui.json
09:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P65300 and previous config saved to /var/cache/conftool/dbconfig/20240621-095024-marostegui.json
09:45 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 0:00:00 on karapace[1001-1002].eqiad.wmnet with reason: The hosts are soon to be decommissioned
09:45 brouberol@cumin2002: START - Cookbook sre.hosts.downtime for 12 days, 0:00:00 on karapace[1001-1002].eqiad.wmnet with reason: The hosts are soon to be decommissioned
09:41 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bookworm
09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P65299 and previous config saved to /var/cache/conftool/dbconfig/20240621-093517-marostegui.json
09:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603/ using stat1009.eqiad.wmnet)
09:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T364069)', diff saved to https://phabricator.wikimedia.org/P65298 and previous config saved to /var/cache/conftool/dbconfig/20240621-092009-marostegui.json
09:16 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
09:14 aborrero@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
09:02 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
08:57 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
08:56 aborrero@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bookworm
08:47 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1053.eqiad.wmnet
08:41 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
08:39 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudvirt1053.eqiad.wmnet
08:14 vgutierrez: restarting logrotate.service on cp[3068,3070-3071].esams.wmnet
08:04 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
08:04 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
08:03 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
08:03 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
08:00 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
08:00 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
07:54 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65297 and previous config saved to /var/cache/conftool/dbconfig/20240621-075404-arnaudb.json
07:38 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65296 and previous config saved to /var/cache/conftool/dbconfig/20240621-073858-arnaudb.json
07:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65295 and previous config saved to /var/cache/conftool/dbconfig/20240621-072353-arnaudb.json
07:08 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65294 and previous config saved to /var/cache/conftool/dbconfig/20240621-070847-arnaudb.json
07:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool for debugging T368098', diff saved to https://phabricator.wikimedia.org/P65293 and previous config saved to /var/cache/conftool/dbconfig/20240621-070358-arnaudb.json
04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T364069)', diff saved to https://phabricator.wikimedia.org/P65292 and previous config saved to /var/cache/conftool/dbconfig/20240621-045107-marostegui.json
04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
04:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
04:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T364069)', diff saved to https://phabricator.wikimedia.org/P65291 and previous config saved to /var/cache/conftool/dbconfig/20240621-045044-marostegui.json
04:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
04:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
04:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367856)', diff saved to https://phabricator.wikimedia.org/P65290 and previous config saved to /var/cache/conftool/dbconfig/20240621-044455-marostegui.json
04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P65289 and previous config saved to /var/cache/conftool/dbconfig/20240621-043537-marostegui.json
04:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P65288 and previous config saved to /var/cache/conftool/dbconfig/20240621-042948-marostegui.json
04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P65287 and previous config saved to /var/cache/conftool/dbconfig/20240621-042030-marostegui.json
04:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P65286 and previous config saved to /var/cache/conftool/dbconfig/20240621-041441-marostegui.json
04:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T364069)', diff saved to https://phabricator.wikimedia.org/P65285 and previous config saved to /var/cache/conftool/dbconfig/20240621-040523-marostegui.json
03:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367856)', diff saved to https://phabricator.wikimedia.org/P65284 and previous config saved to /var/cache/conftool/dbconfig/20240621-035934-marostegui.json
03:04 ejegg: fundraising civicrm upgraded from 2e1db811 to 8a0b5bea
01:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T367856)', diff saved to https://phabricator.wikimedia.org/P65283 and previous config saved to /var/cache/conftool/dbconfig/20240621-014545-marostegui.json
01:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
01:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
01:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65282 and previous config saved to /var/cache/conftool/dbconfig/20240621-014523-marostegui.json
01:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P65281 and previous config saved to /var/cache/conftool/dbconfig/20240621-013016-marostegui.json
01:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P65280 and previous config saved to /var/cache/conftool/dbconfig/20240621-011509-marostegui.json
01:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65279 and previous config saved to /var/cache/conftool/dbconfig/20240621-010002-marostegui.json
00:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T352010)', diff saved to https://phabricator.wikimedia.org/P65278 and previous config saved to /var/cache/conftool/dbconfig/20240621-005237-ladsgroup.json
00:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P65277 and previous config saved to /var/cache/conftool/dbconfig/20240621-003730-ladsgroup.json
00:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P65276 and previous config saved to /var/cache/conftool/dbconfig/20240621-002223-ladsgroup.json
00:08 mutante: [cp3072:~] $ sudo systemctl start varnishkafka-webrequest.service
00:08 mutante: [cp3067:~] $ sudo systemctl start logrotate
00:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T352010)', diff saved to https://phabricator.wikimedia.org/P65275 and previous config saved to /var/cache/conftool/dbconfig/20240621-000716-ladsgroup.json
00:00 sukhe: restarting haproxy on cp3068 and cp3072

2024-06-20

23:47 zabe@deploy1002: Finished scap: Update interwiki cache (duration: 10m 12s)
23:36 zabe@deploy1002: Started scap: Update interwiki cache
23:35 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=btmwiki --cluster=all 2>&1 | tee /tmp/btmwiki.UpdateSearchIndexConfig.log # T368038
23:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T364069)', diff saved to https://phabricator.wikimedia.org/P65274 and previous config saved to /var/cache/conftool/dbconfig/20240620-233346-marostegui.json
23:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
23:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
23:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T364069)', diff saved to https://phabricator.wikimedia.org/P65273 and previous config saved to /var/cache/conftool/dbconfig/20240620-233324-marostegui.json
23:33 zabe@deploy1002: Finished scap: Creating btmwiki (T368038) (duration: 12m 20s)
23:20 zabe@deploy1002: Started scap: Creating btmwiki (T368038)
23:20 zabe: create Wikipedia Mandailing # T368038
23:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P65272 and previous config saved to /var/cache/conftool/dbconfig/20240620-231817-marostegui.json
23:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P65271 and previous config saved to /var/cache/conftool/dbconfig/20240620-230310-marostegui.json
22:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T364069)', diff saved to https://phabricator.wikimedia.org/P65270 and previous config saved to /var/cache/conftool/dbconfig/20240620-224803-marostegui.json
22:39 mutante: aphlict1002/aphlict2001 - systemctl stop aphlict_lograte.timer (and .service); systemctl disable aphlict_logrotate.timer (and .service); systemctl daemon-reload; systemctl reset-failed T367960
22:33 zabe@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T361041 T363825 T366649 (duration: 09m 55s)
22:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65269 and previous config saved to /var/cache/conftool/dbconfig/20240620-222909-marostegui.json
22:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
22:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
22:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367856)', diff saved to https://phabricator.wikimedia.org/P65268 and previous config saved to /var/cache/conftool/dbconfig/20240620-222847-marostegui.json
22:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P65267 and previous config saved to /var/cache/conftool/dbconfig/20240620-221340-marostegui.json
21:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P65266 and previous config saved to /var/cache/conftool/dbconfig/20240620-215833-marostegui.json
21:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367856)', diff saved to https://phabricator.wikimedia.org/P65265 and previous config saved to /var/cache/conftool/dbconfig/20240620-214326-marostegui.json
21:12 ebernhardson@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:12 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
21:12 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
21:11 ebernhardson@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
21:10 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
21:09 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
21:09 brett: Include ncmonitor 1.0.0 in wikimedia-bookworm apt repo
21:09 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:08 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:08 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:08 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
21:07 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
21:07 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:06 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
21:06 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
21:05 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
21:04 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
21:03 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
21:03 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
20:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on elastic1105.eqiad.wmnet with reason: T348977
20:53 bking@cumin2002: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on elastic1105.eqiad.wmnet with reason: T348977
20:44 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
20:44 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
20:43 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
20:42 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
20:40 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
20:40 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
20:39 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
20:38 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
20:36 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
20:36 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
20:34 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
20:33 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
20:28 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
20:27 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/page-analytics: apply
20:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
20:27 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
20:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
20:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
20:26 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
20:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
20:25 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
20:25 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
20:25 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
20:24 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
19:58 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
19:58 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
19:57 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
19:57 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
19:57 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
19:57 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
19:57 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
19:57 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
19:56 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
19:55 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
19:54 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
19:52 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
19:51 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
19:18 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1105* for T348977 - bking@cumin2002
19:18 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1105* for T348977 - bking@cumin2002
19:18 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic1105 for T348977 - bking@cumin2002
19:18 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1105 for T348977 - bking@cumin2002
19:04 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host elastic2088.codfw.wmnet
19:01 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
18:58 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
18:21 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bookworm
18:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T364069)', diff saved to https://phabricator.wikimedia.org/P65263 and previous config saved to /var/cache/conftool/dbconfig/20240620-181635-marostegui.json
18:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
18:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
18:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65262 and previous config saved to /var/cache/conftool/dbconfig/20240620-181613-marostegui.json
18:06 inflatador: bking@an-airflow1007 install `ripgrep` deb pkg
18:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P65261 and previous config saved to /var/cache/conftool/dbconfig/20240620-180104-marostegui.json
17:51 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
17:48 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
17:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P65260 and previous config saved to /var/cache/conftool/dbconfig/20240620-174557-marostegui.json
17:44 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic2088.codfw.wmnet
17:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T352010)', diff saved to https://phabricator.wikimedia.org/P65259 and previous config saved to /var/cache/conftool/dbconfig/20240620-174125-ladsgroup.json
17:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
17:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
17:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65258 and previous config saved to /var/cache/conftool/dbconfig/20240620-173050-marostegui.json
17:30 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bookworm
17:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm
17:15 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
17:14 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
16:34 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm
16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 75%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65256 and previous config saved to /var/cache/conftool/dbconfig/20240620-163348-arnaudb.json
16:30 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bookworm
16:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 50%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65254 and previous config saved to /var/cache/conftool/dbconfig/20240620-161842-arnaudb.json
16:18 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Fix Special:Notifications (T368029) (duration: 12m 21s)
16:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, urbanecm: Continuing with sync
16:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, urbanecm: Backport for Fix Special:Notifications (T368029) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T357309)
16:06 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test glibc updates - bking@cumin2002 - T367978
16:06 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
16:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Fix Special:Notifications (T368029)
16:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1028.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
16:03 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 25%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65253 and previous config saved to /var/cache/conftool/dbconfig/20240620-160337-arnaudb.json
16:03 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
16:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2282.codfw.wmnet
16:01 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw2282.codfw.wmnet
16:01 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=mw2282.codfw.wmnet,cluster=kubernetes,service=kubesvc
16:00 claime: Repooling and uncordoning mw2282.codfw.wmnet following move - T361856
15:59 hnowlan@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T357309)
15:59 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:58 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:57 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2019.codfw.wmnet|wikikube-worker2020.codfw.wmnet|wikikube-worker2021.codfw.wmnet|wikikube-worker2022.codfw.wmnet|wikikube-worker2023.codfw.wmnet|wikikube-worker2024.codfw.wmnet),cluster=kubernetes,service=kubesvc
15:57 claime: Pooling and uncordoning wikikube-worker2019.codfw.wmnet,wikikube-worker2020.codfw.wmnet,wikikube-worker2021.codfw.wmnet,wikikube-worker2022.codfw.wmnet,wikikube-worker2023.codfw.wmnet,wikikube-worker2024.codfw.wmnet - T351074
15:55 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1028.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
15:55 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
15:55 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
15:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T357309)
15:52 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test glibc updates - bking@cumin2002 - T367978
15:48 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 10%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65252 and previous config saved to /var/cache/conftool/dbconfig/20240620-154831-arnaudb.json
15:46 hnowlan@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T357309)
15:46 claime: homer 'cr*codfw*' commit 'T351074'
15:45 cmooney@cumin1002: START - Cookbook sre.hosts.dhcp for host wikikube-ctrl2002.codfw.wmnet
15:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bookworm
15:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2019.codfw.wmnet with OS bullseye
15:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2020.codfw.wmnet with OS bullseye
15:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2022.codfw.wmnet with OS bullseye
15:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:33 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 5%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65251 and previous config saved to /var/cache/conftool/dbconfig/20240620-153326-arnaudb.json
15:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2024.codfw.wmnet with OS bullseye
15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2023.codfw.wmnet with OS bullseye
15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2405.codfw.wmnet
15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2405.codfw.wmnet
15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2404.codfw.wmnet
15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2404.codfw.wmnet
15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2403.codfw.wmnet
15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2403.codfw.wmnet
15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2400.codfw.wmnet
15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2400.codfw.wmnet
15:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2021.codfw.wmnet with OS bullseye
15:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2019.codfw.wmnet with reason: host reimage
15:23 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2020.codfw.wmnet with reason: host reimage
15:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 2%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65249 and previous config saved to /var/cache/conftool/dbconfig/20240620-151820-arnaudb.json
15:18 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2022.codfw.wmnet with reason: host reimage
15:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2024.codfw.wmnet with reason: host reimage
15:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2023.codfw.wmnet with reason: host reimage
15:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2021.codfw.wmnet with reason: host reimage
15:06 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:05 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:04 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore100[4-6].eqiad.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2024.codfw.wmnet with reason: host reimage
15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2023.codfw.wmnet with reason: host reimage
15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2022.codfw.wmnet with reason: host reimage
15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2019.codfw.wmnet with reason: host reimage
15:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2020.codfw.wmnet with reason: host reimage
15:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2021.codfw.wmnet with reason: host reimage
15:02 jhathaway@deploy1002: Finished scap: (no justification provided) (duration: 04m 15s)
15:01 topranks: rebooting lsw1-e6-eqiad to upgrade JunOS on switch T365987
15:01 jhathaway@deploy1002: Started scap: (no justification provided)
14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on an-worker[1160-1162].eqiad.wmnet,es1036.eqiad.wmnet,ms-be1077.eqiad.wmnet with reason: JunOS upgrade lsw1-e6-eqiad
14:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on an-worker[1160-1162].eqiad.wmnet,es1036.eqiad.wmnet,ms-be1077.eqiad.wmnet with reason: JunOS upgrade lsw1-e6-eqiad
14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e6-eqiad,lsw1-e6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e6-eqiad
14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e6-eqiad,lsw1-e6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e6-eqiad
14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lsw1-f6-eqiad.mgmt
14:57 cmooney@cumin1002: START - Cookbook sre.hosts.remove-downtime for lsw1-f6-eqiad.mgmt
14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
14:56 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
14:56 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
14:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1020.eqiad.wmnet
14:54 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1020.eqiad.wmnet
14:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1018.eqiad.wmnet
14:54 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1018.eqiad.wmnet
14:54 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:53 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
14:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts
14:53 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
14:53 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for 6 hosts
14:48 sukhe: homer "*" commit "rolling out NTP ACL change"
14:48 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
14:48 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2024.codfw.wmnet with OS bullseye
14:47 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65248 and previous config saved to /var/cache/conftool/dbconfig/20240620-144750-arnaudb.json
14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2023.codfw.wmnet with OS bullseye
14:47 vgutierrez: rolling restart of pybal on lvs1020 and lvs1018 - T367511
14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2022.codfw.wmnet with OS bullseye
14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2021.codfw.wmnet with OS bullseye
14:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2020.codfw.wmnet with OS bullseye
14:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2364 to wikikube-worker2024
14:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2024
14:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2019.codfw.wmnet with OS bullseye
14:46 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore100[4-6].eqiad.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
14:45 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2024
14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2364 to wikikube-worker2024 - cgoubert@cumin1002"
14:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T367856)', diff saved to https://phabricator.wikimedia.org/P65247 and previous config saved to /var/cache/conftool/dbconfig/20240620-144423-marostegui.json
14:44 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2364 to wikikube-worker2024 - cgoubert@cumin1002"
14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
14:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
14:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
14:43 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
14:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
14:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
14:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65246 and previous config saved to /var/cache/conftool/dbconfig/20240620-144341-marostegui.json
14:42 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore200[5-6].codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
14:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2364 to wikikube-worker2024
14:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
14:39 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2363 to wikikube-worker2023
14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2023
14:38 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1051.eqiad.wmnet with OS bookworm
14:38 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
14:37 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2023
14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2363 to wikikube-worker2023 - cgoubert@cumin1002"
14:37 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2324.codfw.wmnet
14:37 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2324.codfw.wmnet
14:36 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2323.codfw.wmnet
14:36 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2323.codfw.wmnet
14:36 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw1489.eqiad.wmnet
14:36 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw1489.eqiad.wmnet
14:35 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:35 sukhe: running authdns-update for CR 1047074
14:35 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2363 to wikikube-worker2023 - cgoubert@cumin1002"
14:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
14:32 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65245 and previous config saved to /var/cache/conftool/dbconfig/20240620-143244-arnaudb.json
14:32 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:32 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2363 to wikikube-worker2023
14:31 moritzm: imported python-pymysql 1.0.2-2~wmf11u2 to apt.wikimedia.org (merge of the security fix from DSA 5700 on top of our internal backport)
14:31 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 depool ahead of T365987', diff saved to https://phabricator.wikimedia.org/P65244 and previous config saved to /var/cache/conftool/dbconfig/20240620-143109-arnaudb.json
14:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1036.eqiad.wmnet with reason: T365987
14:30 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore200[5-6].codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
14:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on es1036.eqiad.wmnet with reason: T365987
14:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2362 to wikikube-worker2022
14:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2022
14:29 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2004.codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
14:28 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2022
14:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2362 to wikikube-worker2022 - cgoubert@cumin1002"
14:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65243 and previous config saved to /var/cache/conftool/dbconfig/20240620-142834-marostegui.json
14:27 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2362 to wikikube-worker2022 - cgoubert@cumin1002"
14:27 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
14:26 sukhe: sudo cumin 'O:alerting_host' 'run-puppet-agent'
14:25 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
14:25 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
14:25 elukey@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update wmf-plugin for K8s ml-staging - elukey@cumin1002
14:25 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:24 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2362 to wikikube-worker2022
14:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
14:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:24 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
14:22 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2004.codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
14:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2360 to wikikube-worker2021
14:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2021
14:21 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2021
14:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2360 to wikikube-worker2021 - cgoubert@cumin1002"
14:19 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2360 to wikikube-worker2021 - cgoubert@cumin1002"
14:17 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65242 and previous config saved to /var/cache/conftool/dbconfig/20240620-141739-arnaudb.json
14:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: IPIP migration
14:17 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:17 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: IPIP migration
14:17 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2360 to wikikube-worker2021
14:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2358 to wikikube-worker2020
14:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2020
14:15 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2020
14:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2358 to wikikube-worker2020 - cgoubert@cumin1002"
14:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
14:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65241 and previous config saved to /var/cache/conftool/dbconfig/20240620-141328-marostegui.json
14:13 elukey@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update wmf-plugin for K8s ml-staging - elukey@cumin1002
14:13 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2358 to wikikube-worker2020 - cgoubert@cumin1002"
14:10 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:10 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2358 to wikikube-worker2020
14:10 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1051.eqiad.wmnet with reason: host reimage
14:10 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P65240 and previous config saved to /var/cache/conftool/dbconfig/20240620-141010-root.json
14:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2339 to wikikube-worker2019
14:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2019
14:09 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2019
14:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2339 to wikikube-worker2019 - cgoubert@cumin1002"
14:07 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1051.eqiad.wmnet with reason: host reimage
14:07 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2339 to wikikube-worker2019 - cgoubert@cumin1002"
14:04 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:04 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2339 to wikikube-worker2019
14:02 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65239 and previous config saved to /var/cache/conftool/dbconfig/20240620-140233-arnaudb.json
14:01 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1049.eqiad.wmnet with OS bookworm
14:01 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1050.eqiad.wmnet with OS bookworm
13:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65238 and previous config saved to /var/cache/conftool/dbconfig/20240620-135820-marostegui.json
13:57 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
13:56 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
13:56 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
13:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65237 and previous config saved to /var/cache/conftool/dbconfig/20240620-135610-marostegui.json
13:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
13:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
13:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65236 and previous config saved to /var/cache/conftool/dbconfig/20240620-135559-marostegui.json
13:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
13:55 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
13:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
13:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
13:55 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
13:55 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
13:54 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
13:54 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
13:54 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P65235 and previous config saved to /var/cache/conftool/dbconfig/20240620-135438-root.json
13:54 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
13:53 claime: Depooling mw2339.codfw.wmnet,mw2358.codfw.wmnet,mw2360.codfw.wmnet,mw2362.codfw.wmnet,mw2363.codfw.wmnet,mw2364.codfw.wmnet for reimage to k8s - T351074
13:53 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
13:52 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
13:52 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
13:51 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
13:51 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
13:50 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
13:50 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
13:50 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1051.eqiad.wmnet with OS bookworm
13:50 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
13:47 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 10%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65234 and previous config saved to /var/cache/conftool/dbconfig/20240620-134728-arnaudb.json
13:46 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
13:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65233 and previous config saved to /var/cache/conftool/dbconfig/20240620-134052-marostegui.json
13:39 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P65232 and previous config saved to /var/cache/conftool/dbconfig/20240620-133907-root.json
13:36 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
13:32 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1049.eqiad.wmnet with reason: host reimage
13:28 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
13:28 hashar@deploy1002: Finished deploy [integration/docroot@7f59f49]: build: Updating eslint-config-wikimedia to 0.28.2 (duration: 00m 06s)
13:28 hashar@deploy1002: Started deploy [integration/docroot@7f59f49]: build: Updating eslint-config-wikimedia to 0.28.2
13:27 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1049.eqiad.wmnet with reason: host reimage
13:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
13:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
13:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65231 and previous config saved to /var/cache/conftool/dbconfig/20240620-132545-marostegui.json
13:24 reedy@deploy1002: Synchronized wmf-config/: T368003 (duration: 10m 39s)
13:24 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
13:23 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P65230 and previous config saved to /var/cache/conftool/dbconfig/20240620-132335-root.json
13:23 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
13:22 elukey: upload dragonfly packages 1.0.6-2 to bookworm-wikimedia - T365253
13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65228 and previous config saved to /var/cache/conftool/dbconfig/20240620-131038-marostegui.json
13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65227 and previous config saved to /var/cache/conftool/dbconfig/20240620-131031-marostegui.json
13:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
13:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
13:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65226 and previous config saved to /var/cache/conftool/dbconfig/20240620-130928-marostegui.json
13:09 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1050.eqiad.wmnet with OS bookworm
13:09 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1049.eqiad.wmnet with OS bookworm
13:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
13:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
13:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
13:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
13:08 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P65225 and previous config saved to /var/cache/conftool/dbconfig/20240620-130804-root.json
13:07 sukhe: running homer on cr*{eqiad,codfw}* for CR 1046737: update policies/cr-labs.yaml for new NTP servers: T366360
13:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin1002.eqiad.wmnet
13:05 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2003.codfw.wmnet
13:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1002.eqiad.wmnet
13:00 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-staging2003.codfw.wmnet
12:54 sukhe: sudo cumin -b1 -s30 "A:installserver" "run-puppet-agent": T366360
12:51 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 5%: 1', diff saved to https://phabricator.wikimedia.org/P65223 and previous config saved to /var/cache/conftool/dbconfig/20240620-125139-root.json
12:51 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
12:44 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
12:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1048.eqiad.wmnet with OS bookworm
12:33 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1047.eqiad.wmnet with OS bookworm
12:11 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1048.eqiad.wmnet with reason: host reimage
12:08 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
12:06 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1048.eqiad.wmnet with reason: host reimage
12:04 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
11:52 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=mw2282.codfw.wmnet,cluster=kubernetes,service=kubesvc
11:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
11:48 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1048.eqiad.wmnet with OS bookworm
11:47 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bookworm
11:41 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
11:38 XioNoX: merge netbox-extra CR1038869 - Fix lots of CI errors
11:33 jgiannelos@deploy1002: Finished deploy [restbase/deploy@f867c66]: (no justification provided) (duration: 30m 12s)
11:27 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
11:26 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
11:25 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
11:25 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
11:21 akosiaris: upgrade mathoid to 2024-06-18-233457-production T349118
11:20 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: sync
11:20 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: sync
11:03 jgiannelos@deploy1002: Started deploy [restbase/deploy@f867c66]: (no justification provided)
10:57 dreamyjazz@deploy1002: Finished scap: Backport for [testwiki] Fix assignment of 'checkuser-temporary-account' right (T367170) (duration: 15m 03s)
10:48 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
10:44 dreamyjazz@deploy1002: dreamyjazz: Backport for [testwiki] Fix assignment of 'checkuser-temporary-account' right (T367170) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:42 dreamyjazz@deploy1002: Started scap: Backport for [testwiki] Fix assignment of 'checkuser-temporary-account' right (T367170)
10:41 Amir1: running extensions/Echo/maintenance/removeOrphanedEvents.php --force on all wikis (T308084)
10:37 dreamyjazz@deploy1002: Finished scap: Backport for [testwiki] Assign 'checkuser-temporary-account' to the sysop group (T367170) (duration: 13m 49s)
10:33 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1046.eqiad.wmnet with OS bookworm
10:33 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1045.eqiad.wmnet with OS bookworm
10:31 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=mw2321.codfw.wmnet,cluster=kubernetes,service=kubesvc
10:31 claime: repooling and uncordoning mw2321.codfw.wmnet - T367862
10:31 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
10:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2321.codfw.wmnet
10:30 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw2321.codfw.wmnet
10:28 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
10:25 dreamyjazz@deploy1002: dreamyjazz: Backport for [testwiki] Assign 'checkuser-temporary-account' to the sysop group (T367170) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:24 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
10:23 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
10:23 dreamyjazz@deploy1002: Started scap: Backport for [testwiki] Assign 'checkuser-temporary-account' to the sysop group (T367170)
10:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2321.codfw.wmnet with reason: Test scap with host unavailable
10:20 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
10:20 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2321.codfw.wmnet with reason: Test scap with host unavailable
10:19 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
10:18 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
10:18 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
10:17 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
10:16 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
10:16 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
10:15 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=mw2321.codfw.wmnet,cluster=kubernetes,service=kubesvc
10:14 claime: Draining and depooling mw2321.codfw.wmnet to test 1047031 - T367862
10:14 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
10:07 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
10:04 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
10:04 claime: Running puppet on A:wikikube-worker
10:02 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
10:01 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
10:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
10:00 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
09:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
09:51 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
09:50 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
09:49 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
09:47 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:45 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
09:45 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1045.eqiad.wmnet with OS bookworm
09:45 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:16 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php sysop_plwiki AramilFeraxa REDACTED --bureaucrat --sysop # T361041
08:57 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
08:51 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
08:51 cmooney@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.6.6 - cmooney@cumin1002
08:50 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
08:49 cmooney@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.6.6 - cmooney@cumin1002
08:36 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bookworm
08:33 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
08:23 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
08:16 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.10 refs T361404
08:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1001.wikimedia.org
08:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1001.wikimedia.org
08:08 moritzm: reboot of irc1001 to nudge clients to re-connect to the new bullseye host T331702
08:06 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
08:03 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
07:53 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
07:53 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
07:53 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
07:52 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
07:48 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
07:04 moritzm: failover irc.wikimedia.org to the new Bullseye servers T331702
06:04 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on an-worker1085.eqiad.wmnet with reason: T367825 hw maint 2024-06-20
06:03 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 18:00:00 on an-worker1085.eqiad.wmnet with reason: T367825 hw maint 2024-06-20
05:27 marostegui: Deploy schema change on old s7 eqiad master dbmaint (db1236) T364299
05:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Long schema change
05:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Long schema change
05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1236 T367857', diff saved to https://phabricator.wikimedia.org/P65220 and previous config saved to /var/cache/conftool/dbconfig/20240620-052359-root.json
05:22 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1181 to s7 primary and set section read-write T367857', diff saved to https://phabricator.wikimedia.org/P65219 and previous config saved to /var/cache/conftool/dbconfig/20240620-052253-marostegui.json
05:22 marostegui@cumin1002: dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - T367857', diff saved to https://phabricator.wikimedia.org/P65218 and previous config saved to /var/cache/conftool/dbconfig/20240620-052230-marostegui.json
05:22 marostegui: Starting s7 eqiad failover from db1236 to db1181 - T367857
05:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Long schema change
05:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Long schema change
05:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T367857
05:04 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1181 with weight 0 T367857', diff saved to https://phabricator.wikimedia.org/P65217 and previous config saved to /var/cache/conftool/dbconfig/20240620-050428-marostegui.json
05:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 28 hosts with reason: Primary switchover s7 T367857
02:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T367856)', diff saved to https://phabricator.wikimedia.org/P65216 and previous config saved to /var/cache/conftool/dbconfig/20240620-022416-marostegui.json
02:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
02:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
02:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
02:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
02:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65215 and previous config saved to /var/cache/conftool/dbconfig/20240620-022349-marostegui.json
02:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65214 and previous config saved to /var/cache/conftool/dbconfig/20240620-020842-marostegui.json
01:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65213 and previous config saved to /var/cache/conftool/dbconfig/20240620-015335-marostegui.json
01:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65212 and previous config saved to /var/cache/conftool/dbconfig/20240620-013827-marostegui.json

2024-06-19

23:05 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php arbcom_itwiki Superpes15 REDACTED --bureaucrat --sysop
23:05 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php u4cwiki Superpes15 REDACTED --bureaucrat --sysop
21:08 oblivian@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wikikube-ctrl[2001-2002].codfw.wmnet with reason: Reimage --kamila
21:08 oblivian@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wikikube-ctrl[2001-2002].codfw.wmnet with reason: Reimage --kamila
20:33 zabe@deploy1002: Finished scap: Backport for [tlywiki] Change the logo and wordmark/tagline (T366431) (duration: 14m 41s)
20:24 zabe@deploy1002: superpes, zabe: Continuing with sync
20:23 zabe@deploy1002: superpes, zabe: Backport for [tlywiki] Change the logo and wordmark/tagline (T366431) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:19 zabe@deploy1002: Started scap: Backport for [tlywiki] Change the logo and wordmark/tagline (T366431)
19:08 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
19:05 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
18:54 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage
18:51 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage
18:49 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
18:48 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
18:40 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
18:35 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
18:34 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
18:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65211 and previous config saved to /var/cache/conftool/dbconfig/20240619-182922-marostegui.json
18:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
18:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
18:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65210 and previous config saved to /var/cache/conftool/dbconfig/20240619-182900-marostegui.json
18:21 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
18:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65209 and previous config saved to /var/cache/conftool/dbconfig/20240619-181353-marostegui.json
17:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65208 and previous config saved to /var/cache/conftool/dbconfig/20240619-175846-marostegui.json
17:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65207 and previous config saved to /var/cache/conftool/dbconfig/20240619-174338-marostegui.json
17:21 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:21 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
17:20 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
17:13 kamila@cumin1002: START - Cookbook sre.dns.netbox
17:05 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1044.eqiad.wmnet with OS bookworm
17:01 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2002
17:01 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2002
17:01 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2001
17:01 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2001
17:00 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:00 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
16:59 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
16:42 sukhe: sudo cumin 'A:durum' 'run-puppet-agent' to switch timesyncd NTP pools to ntp-[abc].anycast.wmnet: T366360
16:27 claime: pooling and uncordoning mw2321.codfw.wmnet - T367702
16:27 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=mw2321.codfw.wmnet,cluster=kubernetes,service=kubesvc
16:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: service=(ntp-a|ntp-b|ntp-c)
16:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw2321.codfw.wmnet back to active - cgoubert@cumin1002"
16:12 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw2321.codfw.wmnet back to active - cgoubert@cumin1002"
16:09 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
16:03 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
15:55 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
15:55 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
15:51 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:50 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
15:46 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:46 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
15:45 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
15:44 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:43 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
15:32 taavi@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1042
15:32 taavi@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1042
15:24 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:24 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:23 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:23 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:23 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:22 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:16 sukhe: sudo cumin -b1 -s120 'A:dnsbox' 'run-puppet-agent --enable "merging CR 1046685"': T366360
15:08 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
15:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
15:07 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1043.eqiad.wmnet with OS bookworm
15:06 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns2006.wikimedia.org,service=ntp-c
15:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:01 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mw2282.codfw.wmnet with reason: Host move
15:01 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
15:01 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mw2282.codfw.wmnet with reason: Host move
15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2282.codfw.wmnet
15:00 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw2282.codfw.wmnet
14:59 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.remove-downtime (exit_code=97) for wikikube-worker2003.codfw.wmnet
14:59 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-worker2003.codfw.wmnet
14:42 marostegui: Deploy schema change on s2 eqiad master dbmaint T364069
14:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Long schema change
14:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Long schema change
14:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Long schema change
14:38 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1042.eqiad.wmnet with OS bookworm
14:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Long schema change
14:38 taavi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1002"
14:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
14:36 taavi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1002"
14:35 moritzm: installing nano security updates
14:34 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
14:24 moritzm: installing libvpx security updates
14:23 moritzm: installing pymysql security updates
14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
14:19 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
14:17 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
14:14 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
14:12 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
14:11 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
14:11 taavi@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
14:10 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2003.codfw.wmnet with OS bookworm
14:10 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - klausman@cumin2002"
14:09 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
14:09 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - klausman@cumin2002"
14:09 taavi@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
14:08 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
14:08 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
14:07 taavi@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
14:07 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
14:01 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bookworm
13:57 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging2003.codfw.wmnet with reason: host reimage
13:54 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging2003.codfw.wmnet with reason: host reimage
13:53 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
13:53 taavi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1042.eqiad.wmnet with OS bookworm
13:51 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
13:50 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
13:49 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
13:48 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
13:43 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
13:42 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
13:41 taavi@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
13:41 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
13:35 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
13:35 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
13:35 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1043.eqiad.wmnet with OS bookworm
13:35 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5017.*} and A:cp
13:32 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5017.*} and A:cp
13:32 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
13:32 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org,service=ntp-a
13:31 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
13:31 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
13:28 sukhe: enable puppet on dns6001 to test CR 1046685
13:23 sukhe: sudo cumin 'A:dnsbox' 'disable-puppet "merging CR 1046685"': T366360
13:22 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:21 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on mw2282.codfw.wmnet with reason: host move
13:21 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on mw2282.codfw.wmnet with reason: host move
13:20 pt1979@cumin2002: START - Cookbook sre.dns.netbox
13:17 kamila_: drained mw2282.codfw.wmnet for T361856
13:16 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
13:06 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
13:04 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: service=ntp-[abc]
13:04 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: sync on production
12:52 taavi@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
12:51 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2011.codfw.wmnet|wikikube-worker2012.codfw.wmnet|wikikube-worker2013.codfw.wmnet|wikikube-worker2014.codfw.wmnet|wikikube-worker2017.codfw.wmnet|wikikube-worker2018.codfw.wmnet),cluster=kubernetes,service=kubesvc
12:40 claime: homer 'cr*codfw*' commit 'T351074'
12:38 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:38 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
12:38 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
12:37 taavi@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
12:37 taavi@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
12:36 taavi@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
12:36 taavi@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
12:36 taavi@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
12:35 taavi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1042.eqiad.wmnet with OS bookworm
12:34 klausman: Puppet management of install2004 restored, lpxelinux.0 also restored.
12:24 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
12:22 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:21 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
12:20 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:19 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
12:17 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
12:14 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
12:14 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
12:13 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
12:12 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
12:11 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
12:11 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
12:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
12:08 klausman: Will test-replace the PXE chainloader (/srv/tftpboot/lpxelinux.0) on install2003 with a newer version to see if it fixes the ldlinux.c32 error. Puppet will be disabled on that machine for the duration.
12:07 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
12:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
12:06 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
12:03 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
12:03 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
12:02 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
12:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65204 and previous config saved to /var/cache/conftool/dbconfig/20240619-120142-root.json
12:01 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on idp-test1002.wikimedia.org with reason: CAS 7 upgrade
12:01 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on idp-test1002.wikimedia.org with reason: CAS 7 upgrade
12:00 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
12:00 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5017.*} and A:cp
11:57 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
11:57 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5017.*} and A:cp
11:50 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
11:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65203 and previous config saved to /var/cache/conftool/dbconfig/20240619-114636-root.json
11:36 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-eqsin
11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65201 and previous config saved to /var/cache/conftool/dbconfig/20240619-113131-root.json
11:26 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
11:18 ayounsi@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host netbox-dev2003.codfw.wmnet
11:18 ayounsi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host netbox-dev2003.codfw.wmnet with OS bookworm
11:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2012.codfw.wmnet with OS bullseye
11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65200 and previous config saved to /var/cache/conftool/dbconfig/20240619-111625-root.json
11:15 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
11:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2013.codfw.wmnet with OS bullseye
11:14 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
11:14 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
11:13 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:12 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
11:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2014.codfw.wmnet with OS bullseye
11:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
11:08 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2017.codfw.wmnet with OS bullseye
11:07 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
11:07 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:06 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:04 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
11:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2018.codfw.wmnet with OS bullseye
11:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2011.codfw.wmnet with OS bullseye
11:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65199 and previous config saved to /var/cache/conftool/dbconfig/20240619-110120-root.json
10:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2012.codfw.wmnet with reason: host reimage
10:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2013.codfw.wmnet with reason: host reimage
10:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2014.codfw.wmnet with reason: host reimage
10:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2017.codfw.wmnet with reason: host reimage
10:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65198 and previous config saved to /var/cache/conftool/dbconfig/20240619-104614-root.json
10:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2018.codfw.wmnet with reason: host reimage
10:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2011.codfw.wmnet with reason: host reimage
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2017.codfw.wmnet with reason: host reimage
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2018.codfw.wmnet with reason: host reimage
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2013.codfw.wmnet with reason: host reimage
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2014.codfw.wmnet with reason: host reimage
10:40 jmm@deploy1002: Finished scap: (no justification provided) (duration: 04m 03s)
10:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2012.codfw.wmnet with reason: host reimage
10:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2011.codfw.wmnet with reason: host reimage
10:36 jmm@deploy1002: Started scap: (no justification provided)
10:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65197 and previous config saved to /var/cache/conftool/dbconfig/20240619-103109-root.json
10:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2018.codfw.wmnet with OS bullseye
10:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2017.codfw.wmnet with OS bullseye
10:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65196 and previous config saved to /var/cache/conftool/dbconfig/20240619-102504-marostegui.json
10:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
10:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2014.codfw.wmnet with OS bullseye
10:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
10:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2013.codfw.wmnet with OS bullseye
10:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2012.codfw.wmnet with OS bullseye
10:23 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2011.codfw.wmnet with OS bullseye
10:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2409 to wikikube-worker2018
10:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2018
10:22 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2018
10:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2409 to wikikube-worker2018 - cgoubert@cumin1002"
10:21 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2409 to wikikube-worker2018 - cgoubert@cumin1002"
10:18 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:18 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2409 to wikikube-worker2018
10:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2408 to wikikube-worker2017
10:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2017
10:17 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2017
10:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2408 to wikikube-worker2017 - cgoubert@cumin1002"
10:16 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2408 to wikikube-worker2017 - cgoubert@cumin1002"
10:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P65195 and previous config saved to /var/cache/conftool/dbconfig/20240619-101625-marostegui.json
10:14 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:14 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2408 to wikikube-worker2017
10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2405 to wikikube-worker2014
10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2014
10:12 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2014
10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2405 to wikikube-worker2014 - cgoubert@cumin1002"
10:09 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2405 to wikikube-worker2014 - cgoubert@cumin1002"
10:06 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:06 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2405 to wikikube-worker2014
10:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2404 to wikikube-worker2013
10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2013
10:05 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
10:05 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2013
10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2404 to wikikube-worker2013 - cgoubert@cumin1002"
10:03 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2404 to wikikube-worker2013 - cgoubert@cumin1002"
10:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65194 and previous config saved to /var/cache/conftool/dbconfig/20240619-100118-marostegui.json
10:00 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
10:00 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
09:59 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2404 to wikikube-worker2013
09:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2403 to wikikube-worker2012
09:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2012
09:55 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
09:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
09:53 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2012
09:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2403 to wikikube-worker2012 - cgoubert@cumin1002"
09:51 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2403 to wikikube-worker2012 - cgoubert@cumin1002"
09:51 ayounsi@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "netbox-dev2003 - ayounsi@cumin1002"
09:47 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "netbox-dev2003 - ayounsi@cumin1002"
09:47 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
09:47 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2403 to wikikube-worker2012
09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2400 to wikikube-worker2011
09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2011
09:46 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2011
09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2400 to wikikube-worker2011 - cgoubert@cumin1002"
09:44 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
09:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2400 to wikikube-worker2011 - cgoubert@cumin1002"
09:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
09:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2400 to wikikube-worker2011
09:40 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-eqsin
09:34 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:32 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
09:22 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
09:21 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
09:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netbox-dev2003.codfw.wmnet with reason: host reimage
09:15 claime: Depooling mw2400.codfw.wmnet,mw2403.codfw.wmnet,mw2404.codfw.wmnet,mw2405.codfw.wmnet,mw2408.codfw.wmnet,mw2409.codfw.wmnet for reimage - T351074
09:13 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on netbox-dev2003.codfw.wmnet with reason: host reimage
09:11 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
09:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2001.codfw.wmnet with OS bookworm
09:01 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5025.*} and A:cp
08:59 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe1001.eqiad.wmnet with OS bookworm
08:58 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5025.*} and A:cp
08:57 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5017.*} and A:cp
08:54 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5017.*} and A:cp
08:52 fabfur: upgrading eqsin cp hosts to haproxy 2.8.10 (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1047436) (T367756)
08:51 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
08:48 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
08:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15830
08:38 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
08:35 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 15830
08:31 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bookworm
08:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2001.codfw.wmnet with OS bookworm
08:30 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
08:24 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
08:23 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1001.eqiad.wmnet with OS bookworm
08:23 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe1001.eqiad.wmnet with OS bookworm
08:18 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.10 refs T361404
08:12 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bookworm
08:11 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1001.eqiad.wmnet with OS bookworm
08:09 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
08:03 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
08:01 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host netbox-dev2003.codfw.wmnet with OS bookworm
08:00 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
07:59 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
07:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox-dev2003.codfw.wmnet on all recursors
07:59 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache netbox-dev2003.codfw.wmnet on all recursors
07:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
07:57 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
07:54 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
07:54 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host netbox-dev2003.codfw.wmnet
07:48 kartik@deploy1002: Finished scap: Backport for igwiki: Enable MinT for Wikipedia readers (T363464) (duration: 18m 55s)
07:38 kartik@deploy1002: kartik: Continuing with sync
07:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
07:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
07:33 kartik@deploy1002: kartik: Backport for igwiki: Enable MinT for Wikipedia readers (T363464) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:29 kartik@deploy1002: Started scap: Backport for igwiki: Enable MinT for Wikipedia readers (T363464)
07:22 kartik@deploy1002: Finished scap: Backport for testwiki: Enable MinT for Wikipedia readers MVP on a Igbo Wikipedia (T367852) (duration: 20m 12s)
07:20 marostegui: Deploy schema change on old s7 eqiad master db1160 dbmaint T364069
07:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65192 and previous config saved to /var/cache/conftool/dbconfig/20240619-071516-root.json
07:12 kartik@deploy1002: kartik: Continuing with sync
07:07 kartik@deploy1002: kartik: Backport for testwiki: Enable MinT for Wikipedia readers MVP on a Igbo Wikipedia (T367852) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:02 kartik@deploy1002: Started scap: Backport for testwiki: Enable MinT for Wikipedia readers MVP on a Igbo Wikipedia (T367852)
07:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65191 and previous config saved to /var/cache/conftool/dbconfig/20240619-070010-root.json
06:52 jynus: stop db1240:s1, wipe and reimport db1240:s3 T367162
06:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65190 and previous config saved to /var/cache/conftool/dbconfig/20240619-064505-root.json
06:40 XioNoX: merge Puppet "Prepare for netbox-dev" CR1047081
06:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65189 and previous config saved to /var/cache/conftool/dbconfig/20240619-063337-root.json
06:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65188 and previous config saved to /var/cache/conftool/dbconfig/20240619-062959-root.json
06:21 _joe_: upgrading conftool everywhere T367919
06:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65187 and previous config saved to /var/cache/conftool/dbconfig/20240619-061831-root.json
06:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P65186 and previous config saved to /var/cache/conftool/dbconfig/20240619-061721-root.json
06:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65185 and previous config saved to /var/cache/conftool/dbconfig/20240619-061454-root.json
06:08 _joe_: uploaded newer python-conftool packages T367919
06:05 _joe_: deleting manually thirdparty/conda repositories from reprepro T364550
06:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65184 and previous config saved to /var/cache/conftool/dbconfig/20240619-060326-root.json
06:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P65183 and previous config saved to /var/cache/conftool/dbconfig/20240619-060216-root.json
05:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65182 and previous config saved to /var/cache/conftool/dbconfig/20240619-055948-root.json
05:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65181 and previous config saved to /var/cache/conftool/dbconfig/20240619-054820-root.json
05:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P65180 and previous config saved to /var/cache/conftool/dbconfig/20240619-054710-root.json
05:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65179 and previous config saved to /var/cache/conftool/dbconfig/20240619-054443-root.json
05:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65178 and previous config saved to /var/cache/conftool/dbconfig/20240619-054259-root.json
05:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65177 and previous config saved to /var/cache/conftool/dbconfig/20240619-054214-marostegui.json
05:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65176 and previous config saved to /var/cache/conftool/dbconfig/20240619-053315-root.json
05:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P65175 and previous config saved to /var/cache/conftool/dbconfig/20240619-053205-root.json
05:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65174 and previous config saved to /var/cache/conftool/dbconfig/20240619-052754-root.json
05:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
05:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
05:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65173 and previous config saved to /var/cache/conftool/dbconfig/20240619-051809-root.json
05:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P65172 and previous config saved to /var/cache/conftool/dbconfig/20240619-051659-root.json
05:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65171 and previous config saved to /var/cache/conftool/dbconfig/20240619-051248-root.json
05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P65170 and previous config saved to /var/cache/conftool/dbconfig/20240619-051233-root.json
05:10 marostegui@cumin1002: dbctl commit (dc=all): 'repool db1169', diff saved to https://phabricator.wikimedia.org/P65169 and previous config saved to /var/cache/conftool/dbconfig/20240619-051014-marostegui.json
05:09 marostegui@cumin1002: dbctl commit (dc=all): 'test depool db1169', diff saved to https://phabricator.wikimedia.org/P65168 and previous config saved to /var/cache/conftool/dbconfig/20240619-050951-marostegui.json

2024-06-18

23:22 jforrester@deploy1002: Finished scap: Backport for Use isEnumType in selector and isCustomEnum for creating literals (T367159), findAddedContentNeedingReference was removed accidentally (T367920) (duration: 17m 16s)
23:12 jforrester@deploy1002: jforrester, kemayo: Continuing with sync
23:10 jforrester@deploy1002: jforrester, kemayo: Backport for Use isEnumType in selector and isCustomEnum for creating literals (T367159), findAddedContentNeedingReference was removed accidentally (T367920) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:05 jforrester@deploy1002: Started scap: Backport for Use isEnumType in selector and isCustomEnum for creating literals (T367159), findAddedContentNeedingReference was removed accidentally (T367920)
22:49 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:49 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:31 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:20 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:19 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
22:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
22:07 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:56 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:54 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
21:26 jdrewniak@deploy1002: Finished scap: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844) (duration: 16m 33s)
21:16 jdrewniak@deploy1002: jdlrobson, jdrewniak: Continuing with sync
21:14 jdrewniak@deploy1002: jdlrobson, jdrewniak: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:09 jdrewniak@deploy1002: Started scap: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844)
21:07 jdrewniak@deploy1002: Sync cancelled.
21:07 jdrewniak@deploy1002: jdrewniak, jdlrobson: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:03 jdrewniak@deploy1002: Started scap: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844)
20:59 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
20:50 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
20:50 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
20:49 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
20:49 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
20:47 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
20:47 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
20:33 urbanecm@deploy1002: Finished scap: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki (duration: 18m 59s)
20:24 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
20:22 urbanecm@deploy1002: kemayo, urbanecm, superzerocool: Continuing with sync
20:18 urbanecm@deploy1002: kemayo, urbanecm, superzerocool: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:14 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
20:14 urbanecm@deploy1002: Started scap: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki
20:10 urbanecm@deploy1002: Sync cancelled.
20:10 urbanecm@deploy1002: urbanecm, superzerocool, kemayo: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:09 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
20:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
20:06 urbanecm@deploy1002: Started scap: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki
19:59 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:42 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:42 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2003.codfw.wmnet with OS bookworm
19:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:17 mutante: lists1001 - systemctl reset-failed - clean up systemd state due to units not found anymore after migration - disable puppet and then deploy gerrit:1047160 on lists to fix invalid unit name - T331706
18:49 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
18:44 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in esams for T365123
18:39 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in eqsin for T365123
18:33 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in drmrs for T365123
18:29 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
18:27 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in magru for T365123
18:19 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
18:17 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in ulsfo for T365123
18:16 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2003.codfw.wmnet with OS bookworm
18:16 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
17:37 swfrench-wmf: updated conftool to 3.0.0 on bullseye hosts in eqiad for T365123
17:35 swfrench-wmf: updated conftool to 3.0.0 on bookworm hosts in eqiad for T365123
17:34 swfrench-wmf: updated conftool to 3.0.0 on buster hosts in eqiad for T365123
17:21 cdanis: resetting Wiki response time metric on wikimedia.statuspage.io following complete switch to k8s - T362323 T367894
17:16 swfrench-wmf: updated conftool to 3.0.0 on remaining bullseye hosts in codfw for T365123
17:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
17:14 swfrench-wmf: updated conftool to 3.0.0 on remaining bookworm hosts in codfw for T365123
17:12 swfrench-wmf: updated conftool to 3.0.0 on remaining buster hosts in codfw for T365123
16:42 swfrench-wmf: conftool on puppetmaster2001 updated to 3.0.0 for T365123
16:39 swfrench-wmf: validated dbctl 3.0.0 on cumin2002 (noop edit to note: on parsercache spare pc2014) for T365123
16:39 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
16:34 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
16:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-worker1093.eqiad.wmnet with reason: T367825 hw maint
16:31 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-worker1093.eqiad.wmnet with reason: T367825 hw maint
16:29 swfrench-wmf: conftool on cumin2002 updated to 3.0.0 for T365123
16:23 claime: resetting Wiki response time metric on wikimedia.statuspage.io following complete switch to k8s - T362323
16:23 swfrench-wmf: depooled / pooled mw2441.codfw.wmnet to smoke-test python3-conftool for T365123
16:22 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
16:20 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 100%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65167 and previous config saved to /var/cache/conftool/dbconfig/20240618-162053-arnaudb.json
16:19 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
16:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
16:05 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 75%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65166 and previous config saved to /var/cache/conftool/dbconfig/20240618-160548-arnaudb.json
16:02 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
15:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2003.codfw.wmnet with OS bookworm
15:53 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s2
15:53 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s7
15:52 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
15:51 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
15:51 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
15:50 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 50%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65165 and previous config saved to /var/cache/conftool/dbconfig/20240618-155042-arnaudb.json
15:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
15:50 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
15:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T364069)', diff saved to https://phabricator.wikimedia.org/P65164 and previous config saved to /var/cache/conftool/dbconfig/20240618-155000-marostegui.json
15:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
15:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
15:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65163 and previous config saved to /var/cache/conftool/dbconfig/20240618-154938-marostegui.json
15:49 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-staging2003
15:49 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ml-staging2003
15:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
15:47 swfrench-wmf: included conftool 3.0.0 into buster/bullseye/bookworm-wikimedia on apt.w.o for T365123
15:47 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
15:46 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
15:45 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
15:44 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5032.*} and A:cp
15:43 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
15:42 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
15:42 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5032.*} and A:cp
15:41 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5030.*} and A:cp
15:39 fabfur: upgrade haproxy to v2.8.10 on cp5030,cp5032 (T367756)
15:39 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5030.*} and A:cp
15:38 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp3066.*} and A:cp
15:36 fabfur: upgrade haproxy to v2.8.10 on cp3066 (T367756)
15:35 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp3066.*} and A:cp
15:35 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 25%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65162 and previous config saved to /var/cache/conftool/dbconfig/20240618-153537-arnaudb.json
15:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65161 and previous config saved to /var/cache/conftool/dbconfig/20240618-153430-marostegui.json
15:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1003.eqiad.wmnet with OS bookworm
15:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
15:30 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
15:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
15:23 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:20 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 10%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65159 and previous config saved to /var/cache/conftool/dbconfig/20240618-152031-arnaudb.json
15:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65158 and previous config saved to /var/cache/conftool/dbconfig/20240618-151923-marostegui.json
15:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
15:07 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
15:07 brennen@deploy1002: Finished deploy [phabricator/deployment@ef680d8]: revert phab1004 after breakage for T367775 (duration: 00m 15s)
15:07 brennen@deploy1002: Started deploy [phabricator/deployment@ef680d8]: revert phab1004 after breakage for T367775
15:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe1002.eqiad.wmnet with OS bookworm
15:06 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
15:06 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
15:06 brennen@deploy1002: Finished deploy [phabricator/deployment@ebe3a94]: deploy phab1004 for T367775 (duration: 00m 47s)
15:05 brennen@deploy1002: Started deploy [phabricator/deployment@ebe3a94]: deploy phab1004 for T367775
15:05 brennen@deploy1002: Finished deploy [phabricator/deployment@ebe3a94]: deploy phab2002 for T367775 (duration: 00m 36s)
15:04 brennen@deploy1002: Started deploy [phabricator/deployment@ebe3a94]: deploy phab2002 for T367775
15:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65157 and previous config saved to /var/cache/conftool/dbconfig/20240618-150416-marostegui.json
15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
15:00 mforns@deploy1002: Finished deploy [airflow-dags/analytics@4f7d29a]: (no justification provided) (duration: 00m 28s)
15:00 topranks: rebooting lsw1-f7-eqiad to upgrade JunOS on switch T365984
15:00 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host htmldumper1001.eqiad.wmnet
15:00 mforns@deploy1002: Started deploy [airflow-dags/analytics@4f7d29a]: (no justification provided)
14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:35:00 on an-worker[1172-1174].eqiad.wmnet,es1040.eqiad.wmnet,ms-be1081.eqiad.wmnet with reason: JunOS upgrade lsw1-f7-eqiad
14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:35:00 on an-worker[1172-1174].eqiad.wmnet,es1040.eqiad.wmnet,ms-be1081.eqiad.wmnet with reason: JunOS upgrade lsw1-f7-eqiad
14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-f7-eqiad,lsw1-f7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f7-eqiad
14:56 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-f7-eqiad,lsw1-f7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f7-eqiad
14:53 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host htmldumper1001.eqiad.wmnet
14:49 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bookworm
14:47 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:40:00 on lsw1-f7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f7-eqiad
14:47 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:40:00 on lsw1-f7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f7-eqiad
14:44 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1001.eqiad.wmnet with OS bookworm
14:44 jynus: reenable puppet on backup2002
14:40 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ml-serve2001.codfw.wmnet with reason: Hardware maintenance for memory errors
14:40 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ml-serve2001.codfw.wmnet with reason: Hardware maintenance for memory errors
14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 depool - T365984', diff saved to https://phabricator.wikimedia.org/P65156 and previous config saved to /var/cache/conftool/dbconfig/20240618-143951-arnaudb.json
14:39 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4046.ulsfo.wmnet
14:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1040.eqiad.wmnet with reason: T365984
14:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1040.eqiad.wmnet with reason: T365984
14:36 sukhe: enabling puppet and running puppet agent on cp4037
14:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
14:24 claime: trafficserver: move 100% of traffic to mw-on-k8s - T362323
14:23 btullis@cumin1002: START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes
14:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
14:21 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
14:21 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
14:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
14:21 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
14:20 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
14:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
14:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
14:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
14:20 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
14:20 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
14:19 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
14:17 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
14:17 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
14:17 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
14:09 swfrench-wmf: included conftool 3.0.0 into buster-wikimedia on apt.w.o for T365123
14:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: host reimage
14:03 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: host reimage
14:02 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1001.eqiad.wmnet with OS bookworm
13:57 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
13:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
13:57 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
13:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
13:54 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
13:54 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
13:51 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
13:51 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
13:50 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
13:49 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1002.eqiad.wmnet with OS bookworm
13:49 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe1002.eqiad.wmnet with OS bookworm
13:49 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
13:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
13:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
13:47 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
13:45 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
13:40 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
13:39 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo
13:37 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1002.eqiad.wmnet with OS bookworm
13:35 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1002.eqiad.wmnet
13:34 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes azwiktionary --fix # T367264; 7 pages fixed, 10 links fixed
13:33 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Add VL namespace alias to Azerbaijani Wiktionary (T367264) (duration: 16m 07s)
13:29 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-db1002.eqiad.wmnet
13:28 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
13:28 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
13:23 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Continuing with sync
13:22 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1002.eqiad.wmnet
13:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Backport for Add VL namespace alias to Azerbaijani Wiktionary (T367264) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:19 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db1208.eqiad.wmnet
13:19 btullis@cumin1002: START - Cookbook sre.hosts.remove-downtime for db1208.eqiad.wmnet
13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Add VL namespace alias to Azerbaijani Wiktionary (T367264)
13:16 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
13:16 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
13:16 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-mariadb1002.eqiad.wmnet
13:10 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: sync
13:10 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1004.eqiad.wmnet
13:09 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: sync
13:09 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: sync
13:08 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: sync
13:07 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: sync
13:07 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: sync
13:07 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: sync
13:06 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: sync
13:06 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: sync
13:04 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: sync
13:04 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-coord1004.eqiad.wmnet
12:56 vgutierrez: rolling upgrade on A:cp-eqsin to fifo-log-demux 0.7.5 - T364383
12:53 vgutierrez: disable puppet on A:cp-eqsin before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1047070 - T364383
12:52 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
12:51 marostegui: Deploy schema change on old s4 eqiad master db1160 dbmaint T364069
12:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
12:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1160', diff saved to https://phabricator.wikimedia.org/P65155 and previous config saved to /var/cache/conftool/dbconfig/20240618-124945-root.json
12:48 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
12:47 fabfur: upgrade haproxy to v2.8.10 on all ulsfo cp hosts (T367756)
12:47 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
12:43 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
12:42 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
12:42 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
12:42 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
12:40 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
12:36 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
12:35 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2003.codfw.wmnet
12:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2003.codfw.wmnet
12:22 moritzm: rebalance ganeti eqiad/D following reboots
12:15 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
12:15 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
12:06 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:06 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add IPv6 records for mw, parse and wikikube-worker hosts - cmooney@cumin1002"
12:05 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
12:05 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add IPv6 records for mw, parse and wikikube-worker hosts - cmooney@cumin1002"
12:04 topranks: adding Netbox-generated IPv6 DNS records for wikikube-worker, mw and parse hosts
12:04 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
12:00 cmooney@cumin1002: START - Cookbook sre.dns.netbox
11:59 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
11:59 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
11:59 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
11:58 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
11:58 effie: Slowly pointing mediawiki in eqiad to mw-mcrouter daemonset - T346690
11:54 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
11:53 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
11:53 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:50 eoghan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lists.wikimedia.org on all recursors
11:50 eoghan@cumin1002: START - Cookbook sre.dns.wipe-cache lists.wikimedia.org on all recursors
11:48 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1208.eqiad.wmnet with OS bookworm
11:42 marostegui: Delete ipblocks table on clouddb2002-dev (labtestwiki) T367632
11:40 marostegui: Rename ipblocks table on db1169 (enwiki) T367632
11:29 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
11:29 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
11:28 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
11:26 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1208.eqiad.wmnet with reason: host reimage
11:24 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1208.eqiad.wmnet with reason: host reimage
11:22 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet
11:18 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
11:14 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
11:14 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
11:13 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
11:13 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
11:13 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
11:12 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-ui1001.eqiad.wmnet
11:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65152 and previous config saved to /var/cache/conftool/dbconfig/20240618-111001-marostegui.json
11:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
11:09 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host db1208.eqiad.wmnet with OS bookworm
11:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
11:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65151 and previous config saved to /var/cache/conftool/dbconfig/20240618-110939-marostegui.json
11:08 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-ui1001.eqiad.wmnet
11:08 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
11:08 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
11:07 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
11:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1208.eqiad.wmnet with reason: Upgrading to bookworm
11:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
11:05 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1208.eqiad.wmnet with reason: Upgrading to bookworm
11:01 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
10:58 fabfur: cp3066 repooled and puppet enabled (T367756)
10:58 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp3066.esams.wmnet
10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65150 and previous config saved to /var/cache/conftool/dbconfig/20240618-105432-marostegui.json
10:48 marostegui: dbmaint codfw s2 deploy schema change T364069
10:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65149 and previous config saved to /var/cache/conftool/dbconfig/20240618-103925-marostegui.json
10:33 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
10:33 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
10:33 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
10:33 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
10:33 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
10:33 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
10:33 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
10:32 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
10:32 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
10:32 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
10:32 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
10:32 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
10:32 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
10:32 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
10:32 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
10:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
10:31 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
10:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
10:30 moritzm: upload openjdk-21 21.0.3+9-2~deb12u2 for bookworm/wikimedia (secondary rebuild on build2001 following the initial bootstrap build) https://phabricator.wikimedia.org/T367487
10:30 cgoubert@deploy1002: Finished scap: Deploy statsd exporter - T365265 (duration: 03m 39s)
10:27 cgoubert@deploy1002: Started scap: Deploy statsd exporter - T365265
10:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65148 and previous config saved to /var/cache/conftool/dbconfig/20240618-102418-marostegui.json
10:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65147 and previous config saved to /var/cache/conftool/dbconfig/20240618-102130-root.json
10:14 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
10:14 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
10:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65146 and previous config saved to /var/cache/conftool/dbconfig/20240618-100624-root.json
10:05 fabfur: cp3066 currently depooled and puppet disabled for T367756
10:04 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp3066.esams.wmnet
09:53 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1019.eqiad.wmnet|wikikube-worker1020.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet),cluster=kubernetes,service=kubesvc
09:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
09:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65145 and previous config saved to /var/cache/conftool/dbconfig/20240618-095119-root.json
09:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
09:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org
09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org
09:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65144 and previous config saved to /var/cache/conftool/dbconfig/20240618-093614-root.json
09:27 moritzm: arm keyholder on acmechief2002
09:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65143 and previous config saved to /var/cache/conftool/dbconfig/20240618-092108-root.json
09:13 moritzm: rebooting ganeti2029
09:10 marostegui: dbmaint eqiad s4 deploy schema change T367261
09:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65142 and previous config saved to /var/cache/conftool/dbconfig/20240618-090603-root.json
09:05 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
08:53 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.10 refs T361404
08:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 depool to troubleshoot hardware issues', diff saved to https://phabricator.wikimedia.org/P65141 and previous config saved to /var/cache/conftool/dbconfig/20240618-085254-arnaudb.json
08:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: hardware issues
08:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: hardware issues
08:51 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: repl issues
08:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: repl issues
08:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65140 and previous config saved to /var/cache/conftool/dbconfig/20240618-085057-root.json
08:45 hashar@deploy1002: Finished deploy [integration/docroot@7a92240]: doc: Add mwseaql Rust crate (duration: 00m 07s)
08:45 hashar@deploy1002: Started deploy [integration/docroot@7a92240]: doc: Add mwseaql Rust crate
08:43 fabfur: cp4037 currently depooled and puppet disabled for T367756
08:41 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
08:40 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
08:34 marostegui: dbmaint eqiad s6 deploy schema change on eqiad master T364069
08:29 XioNoX: deploy pfw policy update 1718644831 - T367796
07:56 moritzm: uploaded python-irc 8.5.3+dfsg-4+wmf1 to apt.wikimedia.org T331702
07:40 marostegui: dbmaint codfw s7 deploy schema change on codfw master T364069
07:33 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
07:31 kart_: Updated cxserver to 2024-06-13-045621-production (T364122, T138401)
07:30 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
07:29 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
07:28 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
07:28 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
07:26 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
07:26 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
07:20 kartik@deploy1002: Finished scap: Backport for Content Translation: Adjust the Machine translation limit for Telugu WP from 70% to 75% (T367838) (duration: 16m 36s)
07:15 marostegui: dbmaint eqiad s5 deploy schema change on primary master T364069
07:12 marostegui: dbmaint codfw s4 deploy schema change T367261
07:12 marostegui: dbmaint codfw s4 deploy schema change
07:11 kartik@deploy1002: kartik: Continuing with sync
07:09 kartik@deploy1002: kartik: Backport for Content Translation: Adjust the Machine translation limit for Telugu WP from 70% to 75% (T367838) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:04 kartik@deploy1002: Started scap: Backport for Content Translation: Adjust the Machine translation limit for Telugu WP from 70% to 75% (T367838)
06:52 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1240.eqiad.wmnet with reason: data reload
06:52 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1240.eqiad.wmnet with reason: data reload
06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65139 and previous config saved to /var/cache/conftool/dbconfig/20240618-060100-marostegui.json
06:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
06:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
06:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65138 and previous config saved to /var/cache/conftool/dbconfig/20240618-060038-marostegui.json
05:55 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2102.codfw.wmnet
05:55 jynus@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
05:55 jynus@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2102.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin2002"
05:53 jynus@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2102.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin2002"
05:50 jynus@cumin2002: START - Cookbook sre.dns.netbox
05:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65137 and previous config saved to /var/cache/conftool/dbconfig/20240618-054531-marostegui.json
05:44 jynus@cumin2002: START - Cookbook sre.hosts.decommission for hosts db2102.codfw.wmnet
05:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65136 and previous config saved to /var/cache/conftool/dbconfig/20240618-053024-marostegui.json
05:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65135 and previous config saved to /var/cache/conftool/dbconfig/20240618-051517-marostegui.json
05:00 marostegui: dbmaint codfw s5 deploy schema change on db2213 T364299
04:57 marostegui: dbmaint eqiad s2 deploy schema change on db2207 T364299
04:54 marostegui: dbmaint eqiad s4 deploy schema change on db1160 T364299
04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Long schema change
04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Long schema change
04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1160 T367378', diff saved to https://phabricator.wikimedia.org/P65134 and previous config saved to /var/cache/conftool/dbconfig/20240618-044908-root.json
04:48 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1238 to s4 primary and set section read-write T367378', diff saved to https://phabricator.wikimedia.org/P65133 and previous config saved to /var/cache/conftool/dbconfig/20240618-044806-marostegui.json
04:47 marostegui@cumin1002: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T367378', diff saved to https://phabricator.wikimedia.org/P65132 and previous config saved to /var/cache/conftool/dbconfig/20240618-044747-marostegui.json
04:47 marostegui: Starting s4 eqiad failover from db1160 to db1238 - T367378
04:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s4 T367378
04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1238 with weight 0 T367378', diff saved to https://phabricator.wikimedia.org/P65131 and previous config saved to /var/cache/conftool/dbconfig/20240618-042054-marostegui.json
04:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s4 T367378
04:02 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.7 (duration: 02m 50s)
04:01 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.10 refs T361404 (duration: 58m 57s)
03:03 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.10 refs T361404
01:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65130 and previous config saved to /var/cache/conftool/dbconfig/20240618-013639-marostegui.json
01:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
01:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
01:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65129 and previous config saved to /var/cache/conftool/dbconfig/20240618-013616-marostegui.json
01:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P65128 and previous config saved to /var/cache/conftool/dbconfig/20240618-012109-marostegui.json
01:10 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet
01:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P65127 and previous config saved to /var/cache/conftool/dbconfig/20240618-010601-marostegui.json
00:57 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS bullseye
00:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65126 and previous config saved to /var/cache/conftool/dbconfig/20240618-005054-marostegui.json
00:34 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
00:31 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
00:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P65125 and previous config saved to /var/cache/conftool/dbconfig/20240618-002823-ladsgroup.json
00:18 zabe@deploy1002: Finished scap: Update interwiki cache (duration: 14m 03s)
00:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65124 and previous config saved to /var/cache/conftool/dbconfig/20240618-001316-ladsgroup.json
00:10 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye
00:10 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4044.ulsfo.wmnet with OS bullseye
00:05 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=u4cwiki --cluster=all 2>&1 | tee /tmp/u4c.UpdateSearchIndexConfig.log # T366649
00:04 zabe@deploy1002: Started scap: Update interwiki cache
00:02 zabe@deploy1002: Finished scap: T366649 (duration: 15m 16s)
00:00 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye

2024-06-17

23:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65123 and previous config saved to /var/cache/conftool/dbconfig/20240617-235809-ladsgroup.json
23:52 zabe@deploy1002: zabe: Continuing with sync
23:52 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4044.ulsfo.wmnet
23:51 zabe@deploy1002: zabe: T366649 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:48 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=arbcom_itwiki --cluster=all 2>&1 | tee /tmp/arbcom_it.UpdateSearchIndexConfig.log # T363825
23:47 zabe@deploy1002: Started scap: T366649
23:46 zabe: Create an 'Universal Code of Conduct Coordinating Committee (U4C)' private wiki # T366649
23:44 zabe@deploy1002: Finished scap: T363825 (duration: 15m 00s)
23:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P65122 and previous config saved to /var/cache/conftool/dbconfig/20240617-234302-ladsgroup.json
23:34 zabe@deploy1002: zabe: Continuing with sync
23:34 zabe@deploy1002: zabe: T363825 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:29 zabe@deploy1002: Started scap: T363825
23:29 zabe: create private wiki for itwiki arbcom # T363825
23:23 cdobbins@cumin1002: conftool action : set/pooled=yes; selector: name=cp4043.ulsfo.wmnet
23:14 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4043.ulsfo.wmnet with OS bullseye
22:52 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
22:49 cdobbins@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
22:42 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1041.eqiad.wmnet with OS bookworm
22:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P65121 and previous config saved to /var/cache/conftool/dbconfig/20240617-223010-ladsgroup.json
22:28 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
22:26 cdobbins@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4043.ulsfo.wmnet with OS bullseye
22:25 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev200[2-3].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
22:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
22:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P65120 and previous config saved to /var/cache/conftool/dbconfig/20240617-221503-ladsgroup.json
22:12 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
22:11 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev200[2-3].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
22:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2001.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
21:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P65119 and previous config saved to /var/cache/conftool/dbconfig/20240617-215956-ladsgroup.json
21:59 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2001.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
21:55 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1041.eqiad.wmnet with OS bookworm
21:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P65118 and previous config saved to /var/cache/conftool/dbconfig/20240617-214449-ladsgroup.json
21:41 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
21:20 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1040.eqiad.wmnet with OS bookworm
21:09 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=cp4043.ulsfo.wmnet
21:09 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=4043.ulsfo.wmnet
21:05 jforrester@deploy1002: Finished scap: Backport for Fix styles for new heading HTML (T367468) (duration: 18m 57s)
20:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65117 and previous config saved to /var/cache/conftool/dbconfig/20240617-205955-marostegui.json
20:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
20:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
20:55 jforrester@deploy1002: jforrester: Continuing with sync
20:52 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
20:50 jforrester@deploy1002: jforrester: Backport for Fix styles for new heading HTML (T367468) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:50 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
20:46 jforrester@deploy1002: Started scap: Backport for Fix styles for new heading HTML (T367468)
20:34 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1040.eqiad.wmnet with OS bookworm
20:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1039.eqiad.wmnet with OS bookworm
20:08 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4042.ulsfo.wmnet
20:07 jforrester@deploy1002: jforrester: Continuing with sync
20:07 jforrester@deploy1002: jforrester: Backport for [wikifunctionswiki] Remove right to promote/demote sysops and bureaucrats from staff (T365627), Add a note that you cannot change wgCategoryCollation easily (T362494 T366809) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:06 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
20:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS bullseye
20:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
20:02 jforrester@deploy1002: Started scap: Backport for [wikifunctionswiki] Remove right to promote/demote sysops and bureaucrats from staff (T365627), Add a note that you cannot change wgCategoryCollation easily (T362494 T366809)
19:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P65116 and previous config saved to /var/cache/conftool/dbconfig/20240617-195520-ladsgroup.json
19:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
19:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
19:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1039.eqiad.wmnet with OS bookworm
19:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
19:38 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
19:22 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1038.eqiad.wmnet with OS bookworm
19:15 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
19:15 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4042.ulsfo.wmnet with OS bullseye
18:57 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
18:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
18:54 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
18:42 ladsgroup@deploy1002: Finished scap: Backport for Change static footer icons to the new one (T256190), Remove footer override (duration: 17m 12s)
18:36 ejegg: fundraising civicrm upgraded from 66acce1f to a25a359b
18:36 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1038.eqiad.wmnet with OS bookworm
18:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1037.eqiad.wmnet with OS bookworm
18:30 ladsgroup@deploy1002: ladsgroup, jforrester: Continuing with sync
18:29 ladsgroup@deploy1002: ladsgroup, jforrester: Backport for Change static footer icons to the new one (T256190), Remove footer override synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:24 ladsgroup@deploy1002: Started scap: Backport for Change static footer icons to the new one (T256190), Remove footer override
18:19 ladsgroup@deploy1002: Started scap: Backport for Change static footer icons to the new one (T256190)
18:17 ejegg: standalone SmashPig upgraded from 1d1b770c to c8993ec6
18:12 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: sync
18:12 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: sync
18:11 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
18:10 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
18:09 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
18:09 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
18:08 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
18:07 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
18:07 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
18:06 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
18:05 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
18:05 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
18:04 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
18:03 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
18:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
18:02 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
18:01 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
18:00 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
17:58 ejegg: fundraising civicrm upgraded from aa127608 to 66acce1f
17:53 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
17:53 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
17:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1037.eqiad.wmnet with OS bookworm
17:37 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
17:36 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
17:35 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
17:34 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
17:34 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
17:33 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
17:32 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
17:31 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
17:30 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
17:29 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
17:18 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4042.ulsfo.wmnet
17:17 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
17:16 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
17:07 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
17:06 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
17:05 claime: Pooling and uncordoning wikikube-worker1019.eqiad.wmnet,wikikube-worker1020.eqiad.wmnet,wikikube-worker1021.eqiad.wmnet - T351074
17:02 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet
16:59 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: sync
16:59 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: sync
16:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1021.eqiad.wmnet with OS bullseye
16:58 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
16:58 claime: homer 'cr*eqiad*' commit 'T351074'
16:58 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
16:43 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: sync
16:43 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: sync
16:42 mnz@deploy1002: Finished deploy [airflow-dags/research@5e1cd80]: (no justification provided) (duration: 00m 32s)
16:42 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
16:42 mnz@deploy1002: Started deploy [airflow-dags/research@5e1cd80]: (no justification provided)
16:42 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
16:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1021.eqiad.wmnet with reason: host reimage
16:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1019.eqiad.wmnet with reason: host reimage
16:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1020.eqiad.wmnet with reason: host reimage
16:32 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1021.eqiad.wmnet with reason: host reimage
16:31 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1019.eqiad.wmnet with reason: host reimage
16:30 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1020.eqiad.wmnet with reason: host reimage
16:30 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
16:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
16:29 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
16:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
16:29 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
16:28 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
16:27 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
16:27 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
16:26 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
16:25 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
16:25 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt-wdqs1003.eqiad.wmnet
16:25 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:25 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
16:24 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
16:21 andrew@cumin1002: START - Cookbook sre.dns.netbox
16:16 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1003.eqiad.wmnet
16:16 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1002.eqiad.wmnet
16:16 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:14 andrew@cumin1002: START - Cookbook sre.dns.netbox
16:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1019.eqiad.wmnet with OS bullseye
16:09 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 14m 13s)
16:09 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1028.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
16:09 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1019.eqiad.wmnet with OS bullseye
16:08 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1002.eqiad.wmnet
16:05 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1001.eqiad.wmnet
16:05 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:03 andrew@cumin1002: START - Cookbook sre.dns.netbox
16:00 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1028.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
15:59 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
15:57 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1001.eqiad.wmnet
15:57 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:56 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1002.eqiad.wmnet
15:56 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:56 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
15:55 andrew@cumin1002: START - Cookbook sre.dns.netbox
15:55 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
15:52 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 14m 41s)
15:50 topranks: rebooting cr2-eqdfw to upgrade JunOS T364092
15:49 andrew@cumin1002: START - Cookbook sre.dns.netbox
15:48 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
15:48 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1021.eqiad.wmnet with OS bullseye
15:46 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr3-knams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
15:46 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr3-knams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1020.eqiad.wmnet with OS bullseye
15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1019.eqiad.wmnet with OS bullseye
15:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1019.eqiad.wmnet wikikube-worker1020.eqiad.wmnet wikikube-worker1021.eqiad.wmnet on all recursors
15:46 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1002.eqiad.wmnet
15:46 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
15:46 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1019.eqiad.wmnet wikikube-worker1020.eqiad.wmnet wikikube-worker1021.eqiad.wmnet on all recursors
15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1489 to wikikube-worker1021
15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1021
15:44 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1021
15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1489 to wikikube-worker1021 - cgoubert@cumin1002"
15:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1489 to wikikube-worker1021 - cgoubert@cumin1002"
15:41 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:41 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1001.eqiad.wmnet
15:41 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1489 to wikikube-worker1021
15:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1447 to wikikube-worker1020
15:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1020
15:39 topranks: deactivate Tranist and peering sessions on cr2-eqdfw in advance of power-supply change T366864
15:39 andrew@cumin1002: START - Cookbook sre.dns.netbox
15:39 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1020
15:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1447 to wikikube-worker1020 - cgoubert@cumin1002"
15:37 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1447 to wikikube-worker1020 - cgoubert@cumin1002"
15:37 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
15:37 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
15:34 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:34 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1447 to wikikube-worker1020
15:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1444 to wikikube-worker1019
15:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1019
15:32 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1019
15:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1444 to wikikube-worker1019 - cgoubert@cumin1002"
15:32 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
15:31 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1444 to wikikube-worker1019 - cgoubert@cumin1002"
15:31 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
15:29 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl2002.codfw.wmnet
15:29 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:29 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
15:28 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:28 fabfur: upgrading haproxy to 2.8.10 on cp4037 (T367756)
15:28 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp4037.*} and A:cp
15:26 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp4037.*} and A:cp
15:26 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
15:24 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1444 to wikikube-worker1019
15:24 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1444.eqiad.wmnet
15:24 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1444.eqiad.wmnet
15:23 kamila@cumin1002: START - Cookbook sre.dns.netbox
15:21 claime: Depooling mw1444.eqiad.wmnet,mw1447.eqiad.wmnet,mw1489.eqiad.wmnet for reimage - T351074
15:20 topranks: draining transport circuits in/out of eqdfw in advance of router power-supply work/upgrade T366864
15:17 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
15:17 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2002.codfw.wmnet
15:16 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts wikikube-ctrl2002.codfw.wmnet
15:16 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2002.codfw.wmnet
15:10 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
15:03 claime: Repooling mw1359.eqiad.wmnet,mw1364.eqiad.wmnet,mw1365.eqiad.wmnet,mw1412.eqiad.wmnet pending fw upgrade - T351074
15:03 cgoubert@cumin1002: conftool action : set/weight=30:pooled=yes; selector: name=(mw1359.eqiad.wmnet|mw1364.eqiad.wmnet|mw1365.eqiad.wmnet|mw1412.eqiad.wmnet)
14:59 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
14:58 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
14:58 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
14:56 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
14:56 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
14:55 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1444.eqiad.wmnet
14:55 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/FAQ On Countering Terrorist and Violent Extremist Content on Wikimedia Projects" "Wikimedia Foundation/Legal/FAQ On Countering Terrorist and Violent Extremist Content on Wikimedia Projects" "Zabe" --reason "per request T367216"
14:54 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1444.eqiad.wmnet
14:53 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
14:52 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet
14:52 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt-wdqs1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
14:50 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Committee appointments/Announcement/Short" "Wikimedia Foundation/Legal/Committee appointments/Announcement/Short" "Zabe" --reason "per request T367216"
14:48 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1412.eqiad.wmnet
14:48 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1412.eqiad.wmnet
14:48 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
14:47 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Committee appointments/Announcement" "Wikimedia Foundation/Legal/Committee appointments/Announcement" "Zabe" --reason "per request T367216"
14:45 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1365.eqiad.wmnet
14:45 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1365.eqiad.wmnet
14:44 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1364.eqiad.wmnet
14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
14:44 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1364.eqiad.wmnet
14:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
14:44 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl2001.codfw.wmnet
14:44 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:44 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
14:43 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
14:43 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
14:41 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Committee appointments" "Wikimedia Foundation/Legal/Committee appointments" "Zabe" --reason "per request T367216"
14:39 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-staging2003 to codfw - jhancock@cumin2002"
14:39 joal@deploy1002: Finished deploy [airflow-dags/analytics@b682892]: (no justification provided) (duration: 00m 33s)
14:38 joal@deploy1002: Started deploy [airflow-dags/analytics@b682892]: (no justification provided)
14:37 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Tools and processes" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Tools and processes" "Zabe" --reason "per request T367217"
14:36 kamila@cumin1002: START - Cookbook sre.dns.netbox
14:34 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Resources/What is a conduct warning" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Resources/What is a conduct warning" "Zabe" --reason "per request T367217"
14:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-staging2003 to codfw - jhancock@cumin2002"
14:31 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Resources" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Resources" "Zabe" --reason "per request T367217"
14:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:30 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1364.eqiad.wmnet
14:29 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1364.eqiad.wmnet
14:28 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee/Legal agreement" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee/Legal agreement" "Zabe" --reason "per request T367217"
14:27 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Brand Stewardship Report" "Wikimedia Foundation/Legal/Brand Stewardship Report" "Zabe" --reason "per request T367216"
14:24 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1359.eqiad.wmnet
14:23 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1359.eqiad.wmnet
14:23 taavi@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt-wdqs1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
14:22 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2001.eqiad.wmnet
14:21 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2001.codfw.wmnet
14:21 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Announcement/2023 OC and CRC appointments process" "Wikimedia Foundation/Legal/Announcement/2023 OC and CRC appointments process" "Zabe" --reason "per request T367216"
14:18 claime: Depooling mw1359.eqiad.wmnet,mw1364.eqiad.wmnet,mw1365.eqiad.wmnet,mw1412.eqiad.wmnet for reimage - T351074
14:17 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1003.eqiad.wmnet with OS bookworm
14:17 claime: Depooling mw2323.codfw.wmnet,mw2324.codfw.wmnet,mw2326.codfw.wmnet,mw2327.codfw.wmnet,mw2328.codfw.wmnet,mw2329.codfw.wmnet for reimage - T351074
14:17 urbanecm@deploy1002: Finished scap: Backport for Growth: Enable CommunityConfiguration on arwiki, eswiki (T364895) (duration: 15m 34s)
14:16 Amir1: killing updateMenteeData.php --wiki=enwiki --statsd --dbshard s1
14:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for 4 mw servers - cgoubert@cumin1002"
14:11 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for 4 mw servers - cgoubert@cumin1002"
14:11 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/talkheader" "Wikimedia Foundation/Legal/2023 ToU updates/talkheader" "Zabe" --reason "per request T367216"
14:08 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:07 taavi@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudvirt-wdqs1001.eqiad.wmnet
14:06 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Proposed update" "Wikimedia Foundation/Legal/2023 ToU updates/Proposed update" "Zabe" --reason "per request T367216"
14:06 urbanecm@deploy1002: urbanecm: Continuing with sync
14:06 vgutierrez: rolling upgrade on A:cp-codfw to fifo-log-demux 0.7.5 - T364383
14:05 urbanecm@deploy1002: urbanecm: Backport for Growth: Enable CommunityConfiguration on arwiki, eswiki (T364895) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:04 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee/Charter" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee/Charter" "Zabe" --reason "per request T367217"
14:02 vgutierrez: disable puppet on A:cp-codfw before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1046681 - T364383
14:01 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee/Call for applicants" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee/Call for applicants" "Zabe" --reason "per request T367217"
14:01 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
14:01 urbanecm@deploy1002: Started scap: Backport for Growth: Enable CommunityConfiguration on arwiki, eswiki (T364895)
14:01 brouberol@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling reboot on A:datahubsearch
14:00 urbanecm@deploy1002: Finished scap: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153) (duration: 16m 47s)
13:54 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
13:52 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1036.eqiad.wmnet with OS bookworm
13:51 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
13:50 urbanecm@deploy1002: urbanecm, lucaswerkmeister-wmde: Continuing with sync
13:48 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
13:48 urbanecm@deploy1002: urbanecm, lucaswerkmeister-wmde: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:48 taavi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
13:45 brouberol@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling reboot on A:datahubsearch
13:44 urbanecm@deploy1002: Started scap: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153)
13:43 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
13:43 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
13:43 urbanecm@deploy1002: Sync cancelled.
13:43 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
13:43 urbanecm@deploy1002: lucaswerkmeister-wmde, urbanecm: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P65112 and previous config saved to /var/cache/conftool/dbconfig/20240617-133951-ladsgroup.json
13:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
13:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
13:37 urbanecm@deploy1002: Started scap: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153)
13:34 claime: Drained and cordoned wikikube-ctrl2001.codfw.wmnet wikikube-ctrl2002.codfw.wmnet
13:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1003.eqiad.wmnet with OS bookworm
13:33 claime: Uncordoned wikikube-ctrl2003.codfw.wmnet
13:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
13:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
13:26 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
13:25 urbanecm@deploy1002: Finished scap: Backport for Enable subpages for the main namespace in sourceswiki (T367674), CommunityConfiguration: set feedback url instead of bug tool (T363801) (duration: 23m 07s)
13:24 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
13:14 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2002.codfw.wmnet
13:14 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2001.codfw.wmnet
13:14 brouberol@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-jumbo-eqiad
13:13 vgutierrez: rolling upgrade on A:cp-ulsfo to fifo-log-demux 0.7.5 - T364383
13:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
13:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
13:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T352010)', diff saved to https://phabricator.wikimedia.org/P65111 and previous config saved to /var/cache/conftool/dbconfig/20240617-131222-ladsgroup.json
13:10 urbanecm@deploy1002: urbanecm, jhsoby, sgimeno: Continuing with sync
13:07 urbanecm@deploy1002: urbanecm, jhsoby, sgimeno: Backport for Enable subpages for the main namespace in sourceswiki (T367674), CommunityConfiguration: set feedback url instead of bug tool (T363801) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:05 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1036.eqiad.wmnet with OS bookworm
13:03 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1036.eqiad.wmnet with reason: reimage and move to OVS
13:03 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1036.eqiad.wmnet with reason: reimage and move to OVS
13:03 vgutierrez: disable puppet on A:cp-ulsfo before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1046665 - T364383
13:02 urbanecm@deploy1002: Started scap: Backport for Enable subpages for the main namespace in sourceswiki (T367674), CommunityConfiguration: set feedback url instead of bug tool (T363801)
12:59 joal@deploy1002: Finished deploy [airflow-dags/analytics@a8843e6]: (no justification provided) (duration: 00m 03s)
12:59 joal@deploy1002: Started deploy [airflow-dags/analytics@a8843e6]: (no justification provided)
12:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65110 and previous config saved to /var/cache/conftool/dbconfig/20240617-125715-ladsgroup.json
12:53 vgutierrez: upload fifo-log-demux 0.7.5 to apt.wm.o (bullseye-wikimedia)
12:47 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
12:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65109 and previous config saved to /var/cache/conftool/dbconfig/20240617-124207-ladsgroup.json
12:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
12:34 vgutierrez: upgrading HAProxy to version 2.8.10 on cp4051
12:34 vgutierrez: fetch HAProxy 2.8.10 into thirdparty/haproxy28 component for bullseye-wikimedia (apt.wm.o)
12:28 jynus: restarting ms-backup100[12], backup1004-7,11
12:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T352010)', diff saved to https://phabricator.wikimedia.org/P65108 and previous config saved to /var/cache/conftool/dbconfig/20240617-122700-ladsgroup.json
12:14 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2003.codfw.wmnet|wikikube-worker2004.codfw.wmnet|wikikube-worker2007.codfw.wmnet|wikikube-worker2008.codfw.wmnet|wikikube-worker2009.codfw.wmnet|wikikube-worker2010.codfw.wmnet),cluster=kubernetes,service=kubesvc
12:14 claime: pooling and uncordoning wikikube-worker2003.codfw.wmnet wikikube-worker2004.codfw.wmnet wikikube-worker2007.codfw.wmnet wikikube-worker2008.codfw.wmnet wikikube-worker2009.codfw.wmnet wikikube-worker2010.codfw.wmnet - T351074
12:09 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 15830
12:07 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 15830
12:04 jynus: restart db1204, db1205
12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2008.codfw.wmnet with OS bullseye
12:03 claime: homer 'cr*codfw*' commit 'T351074'
12:02 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1035.eqiad.wmnet with OS bookworm
12:02 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: archiva
12:01 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2003.codfw.wmnet
12:01 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2003.codfw.wmnet
11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2010.codfw.wmnet with OS bullseye
11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-worker2003.codfw.wmnet
11:54 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-worker2003.codfw.wmnet
11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2009.codfw.wmnet with OS bullseye
11:53 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: archiva
11:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2003.codfw.wmnet with OS bullseye
11:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2007.codfw.wmnet with OS bullseye
11:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2004.codfw.wmnet with OS bullseye
11:47 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
11:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2008.codfw.wmnet with reason: host reimage
11:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2010.codfw.wmnet with reason: host reimage
11:37 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee" "Zabe" --reason "per request T367217"
11:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
11:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2009.codfw.wmnet with reason: host reimage
11:34 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
11:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2007.codfw.wmnet with reason: host reimage
11:30 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Office hours/Reminder" "Wikimedia Foundation/Legal/2023 ToU updates/Office hours/Reminder" "Zabe" --reason "per request T367216"
11:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2004.codfw.wmnet with reason: host reimage
11:26 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Office hours/Announcement" "Wikimedia Foundation/Legal/2023 ToU updates/Office hours/Announcement" "Zabe" --reason "per request T367216"
11:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2003.codfw.wmnet with reason: host reimage
11:25 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2010.codfw.wmnet with reason: host reimage
11:25 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2009.codfw.wmnet with reason: host reimage
11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2008.codfw.wmnet with reason: host reimage
11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2007.codfw.wmnet with reason: host reimage
11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2004.codfw.wmnet with reason: host reimage
11:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2003.codfw.wmnet with reason: host reimage
11:23 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage
11:22 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Office hours" "Wikimedia Foundation/Legal/2023 ToU updates/Office hours" "Zabe" --reason "per request T367216"
11:17 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/LandingCNTranslate" "Wikimedia Foundation/Legal/2023 ToU updates/LandingCNTranslate" "Zabe" --reason "per request T367216"
11:17 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on archiva1002.wikimedia.org with reason: Upgrading to bullseye
11:17 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on archiva1002.wikimedia.org with reason: Upgrading to bullseye
11:16 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage
11:16 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1035.eqiad.wmnet with OS bookworm
11:13 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1035.eqiad.wmnet with reason: reimage and move to OVS
11:13 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1035.eqiad.wmnet with reason: reimage and move to OVS
11:11 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/About" "Wikimedia Foundation/Legal/2023 ToU updates/About" "Zabe" --reason "per request T367216"
11:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2010.codfw.wmnet with OS bullseye
11:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2009.codfw.wmnet with OS bullseye
11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2008.codfw.wmnet with OS bullseye
11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2007.codfw.wmnet with OS bullseye
11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2004.codfw.wmnet with OS bullseye
11:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2003.codfw.wmnet with OS bullseye
11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2329 to wikikube-worker2010
11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2010
11:06 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2010
11:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2329 to wikikube-worker2010 - cgoubert@cumin1002"
11:03 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2329 to wikikube-worker2010 - cgoubert@cumin1002"
11:03 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates" "Wikimedia Foundation/Legal/2023 ToU updates" "Zabe" --reason "per request T367216"
11:01 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
10:59 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
10:58 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:57 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2329 to wikikube-worker2010
10:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2328 to wikikube-worker2009
10:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2009
10:55 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2009
10:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2328 to wikikube-worker2009 - cgoubert@cumin1002"
10:54 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2328 to wikikube-worker2009 - cgoubert@cumin1002"
10:52 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
10:51 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:51 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2328 to wikikube-worker2009
10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2327 to wikikube-worker2008
10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2008
10:50 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2008
10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2327 to wikikube-worker2008 - cgoubert@cumin1002"
10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw2321.codfw.wmnet with reason: hardware issue
10:50 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2321.codfw.wmnet with reason: hardware issue
10:49 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2327 to wikikube-worker2008 - cgoubert@cumin1002"
10:48 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
10:46 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:46 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2327 to wikikube-worker2008
10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2326 to wikikube-worker2007
10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2007
10:45 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2007
10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2326 to wikikube-worker2007 - cgoubert@cumin1002"
10:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2326 to wikikube-worker2007 - cgoubert@cumin1002"
10:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2326 to wikikube-worker2007
10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2324 to wikikube-worker2004
10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2004
10:39 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2004
10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2324 to wikikube-worker2004 - cgoubert@cumin1002"
10:38 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2324 to wikikube-worker2004 - cgoubert@cumin1002"
10:37 jynus: restarting ms-backup200[12], backup2004-7,11
10:35 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:35 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2324 to wikikube-worker2004
10:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2323 to wikikube-worker2003
10:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2003
10:34 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2003
10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2323 to wikikube-worker2003 - cgoubert@cumin1002"
10:34 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2003
10:34 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2003
10:33 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2323 to wikikube-worker2003 - cgoubert@cumin1002"
10:31 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:31 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2323 to wikikube-worker2003
10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367261)', diff saved to https://phabricator.wikimedia.org/P65107 and previous config saved to /var/cache/conftool/dbconfig/20240617-102938-marostegui.json
10:26 jynus: restarting db2183, db2184
10:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for mw232[3-9] - cgoubert@cumin1002"
10:21 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for mw232[3-9] - cgoubert@cumin1002"
10:17 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65106 and previous config saved to /var/cache/conftool/dbconfig/20240617-101431-marostegui.json
10:11 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:10 kamila@cumin1002: START - Cookbook sre.dns.netbox
10:09 claime: Depooling mw2323.codfw.wmnet,mw2324.codfw.wmnet,mw2326.codfw.wmnet,mw2327.codfw.wmnet,mw2328.codfw.wmnet,mw2329.codfw.wmnet for reimage - T351074
10:08 claime: Depooling mw2323.codfw.wmnet,mw2324.codfw.wmnet,mw2326.codfw.wmnet,mw2327.codfw.wmnet,mw2328.codfw.wmnet,mw2329.codfw.wmnet for reimage
10:01 claime: draining and cordoning mw2321 - T367702
10:01 brouberol@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-jumbo-eqiad
10:01 taavi@deploy1002: Finished scap: Backport for Stop loading OSM i18n (T161553) (duration: 34m 07s)
09:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65104 and previous config saved to /var/cache/conftool/dbconfig/20240617-095924-marostegui.json
09:54 jayme@deploy1002: Finished deploy [docker-pkg/deploy@38eb04d]: Update docker-pkg to 4.0.1 (duration: 00m 24s)
09:53 jayme@deploy1002: Started deploy [docker-pkg/deploy@38eb04d]: Update docker-pkg to 4.0.1
09:52 jayme@deploy1002: Finished deploy [docker-pkg/deploy@4dbea81]: Update docker-pkg to 4.0.1 (duration: 00m 38s)
09:51 jayme@deploy1002: Started deploy [docker-pkg/deploy@4dbea81]: Update docker-pkg to 4.0.1
09:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
09:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
09:49 taavi@deploy1002: taavi: Continuing with sync
09:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P65103 and previous config saved to /var/cache/conftool/dbconfig/20240617-094926-marostegui.json
09:48 taavi@deploy1002: taavi: Backport for Stop loading OSM i18n (T161553) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367261)', diff saved to https://phabricator.wikimedia.org/P65102 and previous config saved to /var/cache/conftool/dbconfig/20240617-094417-marostegui.json
09:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T367261)', diff saved to https://phabricator.wikimedia.org/P65101 and previous config saved to /var/cache/conftool/dbconfig/20240617-094034-marostegui.json
09:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
09:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
09:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
09:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367261)', diff saved to https://phabricator.wikimedia.org/P65100 and previous config saved to /var/cache/conftool/dbconfig/20240617-093427-marostegui.json
09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65099 and previous config saved to /var/cache/conftool/dbconfig/20240617-093419-marostegui.json
09:26 taavi@deploy1002: Started scap: Backport for Stop loading OSM i18n (T161553)
09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65098 and previous config saved to /var/cache/conftool/dbconfig/20240617-091920-marostegui.json
09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65097 and previous config saved to /var/cache/conftool/dbconfig/20240617-091912-marostegui.json
09:05 brouberol@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-test-eqiad
09:04 _joe_: removed damaged AOF file for redis rdb1014-6379, resyncing with primary
09:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65096 and previous config saved to /var/cache/conftool/dbconfig/20240617-090413-marostegui.json
09:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P65095 and previous config saved to /var/cache/conftool/dbconfig/20240617-090405-marostegui.json
09:01 urbanecm@deploy1002: Finished scap: Backport for throttle: Fix exemption for ongoing course (duration: 25m 05s)
08:53 claime: hardcycling rdb1014
08:49 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=mw2321.codfw.wmnet
08:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367261)', diff saved to https://phabricator.wikimedia.org/P65094 and previous config saved to /var/cache/conftool/dbconfig/20240617-084906-marostegui.json
08:40 claime: powercycling rdb1014
08:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2189.codfw.wmnet with reason: Maintenance
08:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2189.codfw.wmnet with reason: Maintenance
08:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367261)', diff saved to https://phabricator.wikimedia.org/P65093 and previous config saved to /var/cache/conftool/dbconfig/20240617-083755-marostegui.json
08:36 urbanecm@deploy1002: Started scap: Backport for throttle: Fix exemption for ongoing course
08:25 brouberol@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-test-eqiad
08:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65092 and previous config saved to /var/cache/conftool/dbconfig/20240617-082248-marostegui.json
08:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65091 and previous config saved to /var/cache/conftool/dbconfig/20240617-080741-marostegui.json
07:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367261)', diff saved to https://phabricator.wikimedia.org/P65090 and previous config saved to /var/cache/conftool/dbconfig/20240617-075234-marostegui.json
07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T367261)', diff saved to https://phabricator.wikimedia.org/P65089 and previous config saved to /var/cache/conftool/dbconfig/20240617-074542-marostegui.json
07:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
07:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367261)', diff saved to https://phabricator.wikimedia.org/P65088 and previous config saved to /var/cache/conftool/dbconfig/20240617-074530-marostegui.json
07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65087 and previous config saved to /var/cache/conftool/dbconfig/20240617-073023-marostegui.json
07:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65086 and previous config saved to /var/cache/conftool/dbconfig/20240617-071516-marostegui.json
07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367261)', diff saved to https://phabricator.wikimedia.org/P65085 and previous config saved to /var/cache/conftool/dbconfig/20240617-070009-marostegui.json
06:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T352010)', diff saved to https://phabricator.wikimedia.org/P65084 and previous config saved to /var/cache/conftool/dbconfig/20240617-065647-ladsgroup.json
06:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
06:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
06:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T352010)', diff saved to https://phabricator.wikimedia.org/P65083 and previous config saved to /var/cache/conftool/dbconfig/20240617-065625-ladsgroup.json
06:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T367261)', diff saved to https://phabricator.wikimedia.org/P65082 and previous config saved to /var/cache/conftool/dbconfig/20240617-065357-marostegui.json
06:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
06:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
06:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367261)', diff saved to https://phabricator.wikimedia.org/P65081 and previous config saved to /var/cache/conftool/dbconfig/20240617-065335-marostegui.json
06:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P65080 and previous config saved to /var/cache/conftool/dbconfig/20240617-064118-ladsgroup.json
06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65079 and previous config saved to /var/cache/conftool/dbconfig/20240617-063923-root.json
06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65078 and previous config saved to /var/cache/conftool/dbconfig/20240617-063826-marostegui.json
06:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P65077 and previous config saved to /var/cache/conftool/dbconfig/20240617-062612-ladsgroup.json
06:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65076 and previous config saved to /var/cache/conftool/dbconfig/20240617-062511-root.json
06:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65075 and previous config saved to /var/cache/conftool/dbconfig/20240617-062418-root.json
06:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65074 and previous config saved to /var/cache/conftool/dbconfig/20240617-062319-marostegui.json
06:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T352010)', diff saved to https://phabricator.wikimedia.org/P65073 and previous config saved to /var/cache/conftool/dbconfig/20240617-061105-ladsgroup.json
06:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65072 and previous config saved to /var/cache/conftool/dbconfig/20240617-061006-root.json
06:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65071 and previous config saved to /var/cache/conftool/dbconfig/20240617-060913-root.json
06:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367261)', diff saved to https://phabricator.wikimedia.org/P65070 and previous config saved to /var/cache/conftool/dbconfig/20240617-060812-marostegui.json
06:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T367261)', diff saved to https://phabricator.wikimedia.org/P65069 and previous config saved to /var/cache/conftool/dbconfig/20240617-060352-marostegui.json
06:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
06:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
06:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
06:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
06:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367261)', diff saved to https://phabricator.wikimedia.org/P65068 and previous config saved to /var/cache/conftool/dbconfig/20240617-060326-marostegui.json
05:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65067 and previous config saved to /var/cache/conftool/dbconfig/20240617-055501-root.json
05:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65066 and previous config saved to /var/cache/conftool/dbconfig/20240617-055407-root.json
05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65065 and previous config saved to /var/cache/conftool/dbconfig/20240617-054819-marostegui.json
05:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65064 and previous config saved to /var/cache/conftool/dbconfig/20240617-053955-root.json
05:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65063 and previous config saved to /var/cache/conftool/dbconfig/20240617-053902-root.json
05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65062 and previous config saved to /var/cache/conftool/dbconfig/20240617-053312-marostegui.json
05:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65061 and previous config saved to /var/cache/conftool/dbconfig/20240617-052450-root.json
05:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65060 and previous config saved to /var/cache/conftool/dbconfig/20240617-052355-root.json
05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367261)', diff saved to https://phabricator.wikimedia.org/P65059 and previous config saved to /var/cache/conftool/dbconfig/20240617-051805-marostegui.json
05:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65058 and previous config saved to /var/cache/conftool/dbconfig/20240617-050944-root.json
05:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T367261)', diff saved to https://phabricator.wikimedia.org/P65057 and previous config saved to /var/cache/conftool/dbconfig/20240617-050852-marostegui.json
05:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65056 and previous config saved to /var/cache/conftool/dbconfig/20240617-050849-root.json
05:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
05:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
05:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P65055 and previous config saved to /var/cache/conftool/dbconfig/20240617-050756-marostegui.json
05:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
05:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
05:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
05:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T367261)', diff saved to https://phabricator.wikimedia.org/P65054 and previous config saved to /var/cache/conftool/dbconfig/20240617-050324-marostegui.json
05:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
05:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance

2024-06-16

22:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T352010)', diff saved to https://phabricator.wikimedia.org/P65053 and previous config saved to /var/cache/conftool/dbconfig/20240616-221944-ladsgroup.json
22:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
22:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
22:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T352010)', diff saved to https://phabricator.wikimedia.org/P65052 and previous config saved to /var/cache/conftool/dbconfig/20240616-221921-ladsgroup.json
22:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65051 and previous config saved to /var/cache/conftool/dbconfig/20240616-220414-ladsgroup.json
21:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65050 and previous config saved to /var/cache/conftool/dbconfig/20240616-214907-ladsgroup.json
21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T352010)', diff saved to https://phabricator.wikimedia.org/P65049 and previous config saved to /var/cache/conftool/dbconfig/20240616-213400-ladsgroup.json
14:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T352010)', diff saved to https://phabricator.wikimedia.org/P65047 and previous config saved to /var/cache/conftool/dbconfig/20240616-140214-ladsgroup.json
14:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
14:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
14:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T352010)', diff saved to https://phabricator.wikimedia.org/P65046 and previous config saved to /var/cache/conftool/dbconfig/20240616-140152-ladsgroup.json
13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65045 and previous config saved to /var/cache/conftool/dbconfig/20240616-134645-ladsgroup.json
13:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65044 and previous config saved to /var/cache/conftool/dbconfig/20240616-133137-ladsgroup.json
13:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T352010)', diff saved to https://phabricator.wikimedia.org/P65043 and previous config saved to /var/cache/conftool/dbconfig/20240616-131630-ladsgroup.json
05:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T352010)', diff saved to https://phabricator.wikimedia.org/P65042 and previous config saved to /var/cache/conftool/dbconfig/20240616-055411-ladsgroup.json
05:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
05:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
05:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T352010)', diff saved to https://phabricator.wikimedia.org/P65041 and previous config saved to /var/cache/conftool/dbconfig/20240616-055359-ladsgroup.json
05:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65040 and previous config saved to /var/cache/conftool/dbconfig/20240616-053852-ladsgroup.json
05:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65039 and previous config saved to /var/cache/conftool/dbconfig/20240616-052345-ladsgroup.json
05:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T352010)', diff saved to https://phabricator.wikimedia.org/P65038 and previous config saved to /var/cache/conftool/dbconfig/20240616-050838-ladsgroup.json
03:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
03:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
03:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T364069)', diff saved to https://phabricator.wikimedia.org/P65037 and previous config saved to /var/cache/conftool/dbconfig/20240616-032102-marostegui.json
03:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P65036 and previous config saved to /var/cache/conftool/dbconfig/20240616-030555-marostegui.json
02:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P65035 and previous config saved to /var/cache/conftool/dbconfig/20240616-025048-marostegui.json
02:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T364069)', diff saved to https://phabricator.wikimedia.org/P65034 and previous config saved to /var/cache/conftool/dbconfig/20240616-023541-marostegui.json
00:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T352010)', diff saved to https://phabricator.wikimedia.org/P65033 and previous config saved to /var/cache/conftool/dbconfig/20240616-000421-ladsgroup.json
00:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
00:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
00:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
00:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
00:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T352010)', diff saved to https://phabricator.wikimedia.org/P65032 and previous config saved to /var/cache/conftool/dbconfig/20240616-000343-ladsgroup.json

Other archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020s