Server Admin Log

2025-04-26

06:34 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
06:34 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
03:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2086.codfw.wmnet with OS bullseye
02:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2086.codfw.wmnet with reason: host reimage
02:51 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2086.codfw.wmnet with reason: host reimage
02:49 sbassett: Deployed security fix for T392746
02:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2086
02:37 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2086
02:34 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2086
02:34 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2086.codfw.wmnet 179.48.192.10.in-addr.arpa 9.7.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
02:34 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2086.codfw.wmnet 179.48.192.10.in-addr.arpa 9.7.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
02:34 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
02:34 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2086 - bking@cumin2002"
02:34 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2086 - bking@cumin2002"

2025-04-25

21:56 bking@cumin2002: START - Cookbook sre.dns.netbox
21:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2086
21:53 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2086.codfw.wmnet with OS bullseye
20:58 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2086.codfw.wmnet with OS bullseye
20:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2086.codfw.wmnet with OS bullseye
20:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2086 to cirrussearch2086
20:53 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2086
20:53 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2086
20:53 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:53 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2086 to cirrussearch2086 - bking@cumin2002"
20:52 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2086 to cirrussearch2086 - bking@cumin2002"
20:47 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
20:38 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
20:34 bking@cumin2002: START - Cookbook sre.dns.netbox
20:34 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2086 to cirrussearch2086
20:32 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2045.codfw.wmnet with OS bookworm
20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:47 cstone: payments-wiki upgraded from c6ba1f35 to e7f66569
19:39 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
19:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host relforge1010.eqiad.wmnet with OS bullseye
19:27 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
19:26 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
19:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on relforge1010.eqiad.wmnet with reason: host reimage
19:09 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on relforge1010.eqiad.wmnet with reason: host reimage
19:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host relforge1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
18:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host relforge1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
18:57 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2084.codfw.wmnet with OS bullseye
18:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host relforge1010.eqiad.wmnet with OS bullseye
18:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host relforge1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
18:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2084.codfw.wmnet with reason: host reimage
18:30 jclark@cumin1002: START - Cookbook sre.hosts.provision for host relforge1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
18:28 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2084.codfw.wmnet with reason: host reimage
18:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
18:23 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
18:18 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
18:18 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
18:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
18:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2084
18:14 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2084
18:13 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2084
18:13 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2084.codfw.wmnet 56.48.192.10.in-addr.arpa 6.5.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
18:13 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2084.codfw.wmnet 56.48.192.10.in-addr.arpa 6.5.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
18:13 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:13 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2084 - bking@cumin2002"
18:13 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2084 - bking@cumin2002"
18:08 bking@cumin2002: START - Cookbook sre.dns.netbox
17:14 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2084
17:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2084.codfw.wmnet with OS bullseye
17:13 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2084.codfw.wmnet on all recursors
17:13 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2084.codfw.wmnet on all recursors
17:13 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2084 to cirrussearch2084
17:12 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2084
17:08 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2084
17:08 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:08 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2084 to cirrussearch2084 - bking@cumin2002"
17:07 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2084 to cirrussearch2084 - bking@cumin2002"
17:03 bking@cumin2002: START - Cookbook sre.dns.netbox
17:03 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2084 to cirrussearch2084
16:58 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2078.codfw.wmnet with OS bullseye
16:56 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Managing sanitization for wikis nupwiki in section s5
16:56 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis nupwiki in section s5
16:47 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Managing sanitization for wikis nupwiki in section s5
16:45 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2083.codfw.wmnet with OS bullseye
16:44 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis nupwiki in section s5
16:31 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Managing sanitization for wikis nupwiki in section s5
16:27 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis nupwiki in section s5
16:26 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2083.codfw.wmnet with reason: host reimage
16:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2083.codfw.wmnet with reason: host reimage
16:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2083
16:08 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2083
16:08 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2083
16:08 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2083.codfw.wmnet 88.32.192.10.in-addr.arpa 8.8.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:07 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2083.codfw.wmnet 88.32.192.10.in-addr.arpa 8.8.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:07 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:07 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2083 - bking@cumin2002"
16:07 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2083 - bking@cumin2002"
16:03 bking@cumin2002: START - Cookbook sre.dns.netbox
16:03 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2083
16:03 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2083.codfw.wmnet with OS bullseye
16:02 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2083.codfw.wmnet on all recursors
16:02 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2083.codfw.wmnet on all recursors
16:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2083 to cirrussearch2083
15:59 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2083
15:59 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2083
15:59 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:59 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2083 to cirrussearch2083 - bking@cumin2002"
15:59 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2083 to cirrussearch2083 - bking@cumin2002"
15:55 bking@cumin2002: START - Cookbook sre.dns.netbox
15:55 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2083 to cirrussearch2083
15:54 dancy@deploy1003: Installation of scap version "4.157.1" completed for 2 hosts
15:52 dancy@deploy1003: Installing scap version "4.157.1" for 2 host(s)
15:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
15:38 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
15:38 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
15:29 dancy@deploy1003: Installation of scap version "4.157.0" completed for 2 hosts
15:27 dancy@deploy1003: Installing scap version "4.157.0" for 2 host(s)
15:18 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Managing sanitization for wikis nupwiki in section s5
15:10 dancy: dancy@deploy1003 Cancelled
15:09 dancy@deploy1003: Installing scap version "4.156.0" for 2 host(s)
15:07 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis nupwiki in section s5
15:07 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Checking sanitization for wikis nupwiki in section s5
15:06 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis nupwiki in section s5
14:46 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2081.codfw.wmnet with OS bullseye
14:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2081.codfw.wmnet with reason: host reimage
14:26 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis nupwiki in section s5
14:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2081.codfw.wmnet with reason: host reimage
14:23 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis nupwiki in section s5
14:13 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudlb2001-dev.codfw.wmnet
14:13 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2081
14:11 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2081
14:11 andrew@cumin1002: START - Cookbook sre.dns.netbox
14:10 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2081
14:10 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2081.codfw.wmnet 86.32.192.10.in-addr.arpa 6.8.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
14:10 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2081.codfw.wmnet 86.32.192.10.in-addr.arpa 6.8.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
14:10 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:10 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2081 - bking@cumin2002"
14:10 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2081 - bking@cumin2002"
14:09 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Managing sanitization for wikis nupwiki in section s5
14:05 bking@cumin2002: START - Cookbook sre.dns.netbox
14:03 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis nupwiki in section s5
14:02 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudlb2001-dev.codfw.wmnet
13:59 Emperor: restart object-replicator on ms-be2089
13:51 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2081
13:51 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2081.codfw.wmnet with OS bullseye
13:51 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Managing sanitization for wikis nupwiki in section s5
13:51 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2081.codfw.wmnet on all recursors
13:51 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2081.codfw.wmnet on all recursors
13:49 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2081 to cirrussearch2081
13:48 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2081
13:48 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2081
13:48 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:48 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2081 to cirrussearch2081 - bking@cumin2002"
13:47 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2081 to cirrussearch2081 - bking@cumin2002"
13:46 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis nupwiki in section s5
13:45 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis nupwiki in section s5
13:43 taavi@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2004-dev.codfw.wmnet
13:43 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis nupwiki in section s5
13:36 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Managing sanitization for wikis nupwiki in section s5
13:34 taavi@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudlb2004-dev.codfw.wmnet
13:33 taavi: add cloudlb2004-dev bgp session to cloudsw1-b1-codfw T377126
13:33 bking@cumin2002: START - Cookbook sre.dns.netbox
13:32 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2081 to cirrussearch2081
13:31 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis nupwiki in section s5
13:29 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis nupwiki in section s5
13:26 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis nupwiki in section s5
13:08 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1001.eqiad.wmnet with OS trixie
12:58 vgutierrez: restarting grafana-server.service @ grafana1002.eqiad.wmnet
11:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
11:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
11:31 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS trixie
09:31 moritzm: restarting puppetserver on puppetserver1002 (apparently needs a restart which per timing seems related to https://gerrit.wikimedia.org/r/c/operations/puppet/+/1138904)
09:16 vgutierrez: restarting puppetserver on puppetserver1003
09:11 taavi: removed cloudlb2001-dev bgp session from cloudsw1-b1-codfw T377126
08:24 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2032 to es1 master T391921', diff saved to https://phabricator.wikimedia.org/P75463 and previous config saved to /var/cache/conftool/dbconfig/20250425-082420-marostegui.json
07:50 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2002-dev.codfw.wmnet
07:44 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on krb1002.eqiad.wmnet with reason: work in progress, not yet active
07:38 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2002-dev.codfw.wmnet
07:13 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P75462 and previous config saved to /var/cache/conftool/dbconfig/20250425-071339-root.json
06:58 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P75461 and previous config saved to /var/cache/conftool/dbconfig/20250425-065834-root.json
06:43 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P75460 and previous config saved to /var/cache/conftool/dbconfig/20250425-064329-root.json
06:28 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P75459 and previous config saved to /var/cache/conftool/dbconfig/20250425-062824-root.json
06:13 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P75458 and previous config saved to /var/cache/conftool/dbconfig/20250425-061319-root.json
05:58 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P75457 and previous config saved to /var/cache/conftool/dbconfig/20250425-055813-root.json
05:43 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P75456 and previous config saved to /var/cache/conftool/dbconfig/20250425-054308-root.json
05:42 marostegui@dns1006: END - running authdns-update
05:39 marostegui@dns1006: START - running authdns-update
05:37 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1032 to es1 master T391921', diff saved to https://phabricator.wikimedia.org/P75455 and previous config saved to /var/cache/conftool/dbconfig/20250425-053744-marostegui.json
05:28 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P75454 and previous config saved to /var/cache/conftool/dbconfig/20250425-052802-root.json
05:12 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P75453 and previous config saved to /var/cache/conftool/dbconfig/20250425-051257-root.json
05:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2030.codfw.wmnet with reason: Maintenance
05:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2030 T391921', diff saved to https://phabricator.wikimedia.org/P75452 and previous config saved to /var/cache/conftool/dbconfig/20250425-050538-marostegui.json

2025-04-24

23:47 pt1979@cumin2002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f1-codfw
23:47 pt1979@cumin2002: START - Cookbook sre.network.tls for network device lsw1-f1-codfw
23:42 eileen: config revision changed from 7bf2c087 to 1c84d1a7
23:32 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2078.codfw.wmnet with OS bullseye
23:31 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
23:30 rzl@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
23:29 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
23:28 rzl@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
23:28 rzl@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
23:27 rzl@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
23:25 rzl@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
23:22 rzl@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
22:34 zabe@deploy1003: Finished scap sync-world: T390384 (duration: 11m 08s)
22:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2080.codfw.wmnet with OS bullseye
22:23 zabe@deploy1003: Started scap sync-world: T390384
22:17 eileen: config revision changed from 47a5d384 to 7bf2c087
22:15 zabe@deploy1003: Finished scap sync-world: Backport for Activate nupwiki (T390384) (duration: 11m 54s)
22:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
22:11 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
22:11 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
22:10 eileen: update CIviCRM civicrm: revision 3ca2db06
22:09 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
22:08 zabe@deploy1003: zabe: Continuing with sync
22:07 zabe@deploy1003: zabe: Backport for Activate nupwiki (T390384) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2080.codfw.wmnet with reason: host reimage
22:04 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2080.codfw.wmnet with reason: host reimage
22:03 zabe@deploy1003: Started scap sync-world: Backport for Activate nupwiki (T390384)
22:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
22:02 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
22:02 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
22:00 zabe@deploy1003: Finished scap sync-world: Backport for Prepare nupwiki (T390384) (duration: 13m 15s)
21:54 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-f1-codfw.mgmt.codfw.wmnet
21:53 zabe@deploy1003: zabe: Continuing with sync
21:52 zabe@deploy1003: zabe: Backport for Prepare nupwiki (T390384) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:49 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2080
21:49 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2080
21:49 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2080
21:49 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2080.codfw.wmnet 127.16.192.10.in-addr.arpa 7.2.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
21:49 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2080.codfw.wmnet 127.16.192.10.in-addr.arpa 7.2.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
21:49 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:49 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2080 - bking@cumin2002"
21:49 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2080 - bking@cumin2002"
21:47 zabe@deploy1003: Started scap sync-world: Backport for Prepare nupwiki (T390384)
21:45 bking@cumin2002: START - Cookbook sre.dns.netbox
21:45 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2080
21:44 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2080.codfw.wmnet with OS bullseye
21:39 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2080.codfw.wmnet with OS bullseye
21:39 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2080.codfw.wmnet with OS bullseye
21:35 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2080 to cirrussearch2080
21:35 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2080
21:29 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for db1178.eqiad.wmnet: Renew puppet certificate - jhathaway@cumin1002
21:28 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:28 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-f1-codfw - pt1979@cumin2002"
21:28 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-f1-codfw - pt1979@cumin2002"
21:26 jhathaway@cumin1002: DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for db1178.eqiad.wmnet: Renew puppet certificate - jhathaway@cumin1002
21:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
21:24 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-f1-codfw.mgmt.codfw.wmnet
21:23 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2080
21:23 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:23 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2080 to cirrussearch2080 - bking@cumin2002"
21:23 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2080 to cirrussearch2080 - bking@cumin2002"
21:18 bking@cumin2002: START - Cookbook sre.dns.netbox
21:18 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2080 to cirrussearch2080
21:13 jhathaway: restarting puppetserver1002 to test crl
20:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
20:58 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
20:58 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
20:57 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
20:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
20:51 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
20:51 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
20:38 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cirrussearch2078']
20:35 jgleeson: payments-wiki upgraded from d250a3b8 to c6ba1f35
20:29 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2078']
20:29 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
20:22 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
20:22 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
20:22 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
20:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2076.codfw.wmnet with OS bullseye
20:13 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
19:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2076.codfw.wmnet with reason: host reimage
19:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
19:55 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
19:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
19:52 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
19:52 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2076.codfw.wmnet with reason: host reimage
19:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
19:48 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
19:48 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
19:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2076
19:38 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2076
19:35 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2076
19:35 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2076.codfw.wmnet 206.0.192.10.in-addr.arpa 6.0.2.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
19:35 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2076.codfw.wmnet 206.0.192.10.in-addr.arpa 6.0.2.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
19:35 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:35 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2076 - bking@cumin2002"
19:35 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2076 - bking@cumin2002"
19:30 bking@cumin2002: START - Cookbook sre.dns.netbox
19:29 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2076
19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating pdus in codfw - jhancock@cumin2002"
19:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2076.codfw.wmnet with OS bullseye
19:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating pdus in codfw - jhancock@cumin2002"
19:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2076 to cirrussearch2076
19:28 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2076
19:28 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2076
19:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:28 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2076 to cirrussearch2076 - bking@cumin2002"
19:25 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2076 to cirrussearch2076 - bking@cumin2002"
19:25 jhancock@cumin2002: START - Cookbook sre.dns.netbox
19:19 bking@cumin2002: START - Cookbook sre.dns.netbox
19:19 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2076 to cirrussearch2076
17:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2073.codfw.wmnet with OS bullseye
17:41 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
17:39 rzl@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
17:39 pt1979@cumin2002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f3-codfw
17:39 pt1979@cumin2002: START - Cookbook sre.network.tls for network device lsw1-f3-codfw
17:36 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
17:35 rzl@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
17:32 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2073.codfw.wmnet with reason: host reimage
17:28 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2073.codfw.wmnet with reason: host reimage
17:18 rzl@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
17:17 rzl@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
17:16 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-f3-codfw.mgmt.codfw.wmnet
17:14 rzl@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
17:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2073
17:14 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2073
17:13 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2073
17:13 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2073.codfw.wmnet 28.0.192.10.in-addr.arpa 8.2.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
17:13 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2073.codfw.wmnet 28.0.192.10.in-addr.arpa 8.2.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
17:13 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:13 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2073 - bking@cumin2002"
17:13 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2073 - bking@cumin2002"
17:09 rzl@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
17:09 bking@cumin2002: START - Cookbook sre.dns.netbox
17:08 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2073
17:08 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2073.codfw.wmnet with OS bullseye
17:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2073 to cirrussearch2073
17:04 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2073
17:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2073
17:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2073 to cirrussearch2073 - bking@cumin2002"
17:03 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2073 to cirrussearch2073 - bking@cumin2002"
16:58 bking@cumin2002: START - Cookbook sre.dns.netbox
16:58 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2073 to cirrussearch2073
16:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:45 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-f3-codfw - pt1979@cumin2002"
16:41 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-f3-codfw - pt1979@cumin2002"
16:37 pt1979@cumin2002: START - Cookbook sre.dns.netbox
16:37 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-f3-codfw.mgmt.codfw.wmnet
16:24 pt1979@cumin2002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-codfw
16:24 pt1979@cumin2002: START - Cookbook sre.network.tls for network device ssw1-f1-codfw
16:21 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-f1-codfw.mgmt.codfw.wmnet
16:16 brett: Delete source packages for varnish in bullseye-wikimedia
16:08 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1045.eqiad.wmnet
16:08 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase1045.eqiad.wmnet
16:08 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1044.eqiad.wmnet
16:08 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase1044.eqiad.wmnet
16:08 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1043.eqiad.wmnet
16:08 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase1043.eqiad.wmnet
15:58 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=eqiad,name=restbase1045.eqiad.wmnet
15:58 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=eqiad,name=restbase1044.eqiad.wmnet
15:58 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=eqiad,name=restbase1043.eqiad.wmnet
15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:50 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-f1-codfw - pt1979@cumin2002"
15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:45 moritzm: installing twitter-bootstrap3 security updates
15:45 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-f1-codfw - pt1979@cumin2002"
15:40 brett: remove libvarnishapi2 from bullseye-wikimedia main
15:39 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:39 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-f1-codfw.mgmt.codfw.wmnet
15:38 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
15:38 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
15:38 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
15:37 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
15:37 lucaswerkmeister-wmde@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
15:36 brett: remove varnish libvmod-netmapper libvmod-querysort libvmod-re2 varnish-modules libvarnishapi2 varnishkafka from buster-wikimedia
15:36 lucaswerkmeister-wmde@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
14:59 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
14:59 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
14:59 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
14:59 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
14:49 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P75449 and previous config saved to /var/cache/conftool/dbconfig/20250424-144923-root.json
14:34 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P75447 and previous config saved to /var/cache/conftool/dbconfig/20250424-143417-root.json
14:25 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Fix EntitySchema propertyType on Wikidata (T371196) (duration: 12m 11s)
14:19 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P75446 and previous config saved to /var/cache/conftool/dbconfig/20250424-141911-root.json
14:18 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Continuing with sync
14:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-logging2005']
14:17 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Backport for Fix EntitySchema propertyType on Wikidata (T371196) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:13 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Fix EntitySchema propertyType on Wikidata (T371196)
14:11 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging2005']
14:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['kafka-logging2005']
14:11 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging2005']
14:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-logging2005']
14:04 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P75445 and previous config saved to /var/cache/conftool/dbconfig/20250424-140406-root.json
13:59 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging2005']
13:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-logging2005']
13:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging2005']
13:51 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging2005']
13:49 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P75444 and previous config saved to /var/cache/conftool/dbconfig/20250424-134900-root.json
13:43 taavi@dns3004: END - running authdns-update
13:40 taavi@dns3004: START - running authdns-update
13:33 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P75443 and previous config saved to /var/cache/conftool/dbconfig/20250424-133355-root.json
13:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T391056)', diff saved to https://phabricator.wikimedia.org/P75442 and previous config saved to /var/cache/conftool/dbconfig/20250424-131928-fceratto.json
13:18 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P75441 and previous config saved to /var/cache/conftool/dbconfig/20250424-131850-root.json
13:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P75440 and previous config saved to /var/cache/conftool/dbconfig/20250424-130421-fceratto.json
13:03 kart_: Updated cxserver to 2025-04-15-070132-production (T391289)
13:03 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P75439 and previous config saved to /var/cache/conftool/dbconfig/20250424-130344-root.json
12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
12:58 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
12:55 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
12:54 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
12:53 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-fe1016.eqiad.wmnet with OS bullseye
12:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P75436 and previous config saved to /var/cache/conftool/dbconfig/20250424-124914-fceratto.json
12:49 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe1016.eqiad.wmnet with OS bullseye
12:48 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P75435 and previous config saved to /var/cache/conftool/dbconfig/20250424-124838-root.json
12:36 ladsgroup@deploy1003: Finished scap sync-world: Backport for Add support for x3 db cluster (T351820) (duration: 14m 28s)
12:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T391056)', diff saved to https://phabricator.wikimedia.org/P75434 and previous config saved to /var/cache/conftool/dbconfig/20250424-123407-fceratto.json
12:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync
12:26 ladsgroup@deploy1003: ladsgroup: Backport for Add support for x3 db cluster (T351820) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:21 ladsgroup@deploy1003: Started scap sync-world: Backport for Add support for x3 db cluster (T351820)
12:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T391056)', diff saved to https://phabricator.wikimedia.org/P75433 and previous config saved to /var/cache/conftool/dbconfig/20250424-121819-fceratto.json
12:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2216.codfw.wmnet with reason: Maintenance
12:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T391056)', diff saved to https://phabricator.wikimedia.org/P75432 and previous config saved to /var/cache/conftool/dbconfig/20250424-121756-fceratto.json
12:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2046.codfw.wmnet with OS bookworm
12:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2047.codfw.wmnet with OS bookworm
12:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2048.codfw.wmnet with OS bookworm
12:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2032.codfw.wmnet with reason: Maintenance
12:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2032.codfw.wmnet with reason: Maintenance
12:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2032 T391921', diff saved to https://phabricator.wikimedia.org/P75431 and previous config saved to /var/cache/conftool/dbconfig/20250424-121152-marostegui.json
12:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P75430 and previous config saved to /var/cache/conftool/dbconfig/20250424-120249-fceratto.json
11:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P75429 and previous config saved to /var/cache/conftool/dbconfig/20250424-114742-fceratto.json
11:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T391056)', diff saved to https://phabricator.wikimedia.org/P75428 and previous config saved to /var/cache/conftool/dbconfig/20250424-113234-fceratto.json
11:16 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T391056)', diff saved to https://phabricator.wikimedia.org/P75427 and previous config saved to /var/cache/conftool/dbconfig/20250424-111625-fceratto.json
11:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2212.codfw.wmnet with reason: Maintenance
11:11 moritzm: installing python-urllib3 security updates
11:02 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2202.codfw.wmnet with reason: Maintenance
11:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T391056)', diff saved to https://phabricator.wikimedia.org/P75426 and previous config saved to /var/cache/conftool/dbconfig/20250424-110230-fceratto.json
10:47 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) virt.cloudgw.eqiad1.wikimediacloud.org on all recursors
10:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P75425 and previous config saved to /var/cache/conftool/dbconfig/20250424-104723-fceratto.json
10:47 aborrero@cumin1002: START - Cookbook sre.dns.wipe-cache virt.cloudgw.eqiad1.wikimediacloud.org on all recursors
10:47 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:47 aborrero@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw updates - aborrero@cumin1002"
10:47 aborrero@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw updates - aborrero@cumin1002"
10:33 aborrero@cumin1002: START - Cookbook sre.dns.netbox
10:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P75424 and previous config saved to /var/cache/conftool/dbconfig/20250424-103217-fceratto.json
10:29 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudinstances2b-gw.openstack.eqiad1.wikimediacloud.org on all recursors
10:29 aborrero@cumin1002: START - Cookbook sre.dns.wipe-cache cloudinstances2b-gw.openstack.eqiad1.wikimediacloud.org on all recursors
10:27 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
10:26 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:26 aborrero@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: neutron updates - aborrero@cumin1002"
10:26 aborrero@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: neutron updates - aborrero@cumin1002"
10:22 aborrero@cumin1002: START - Cookbook sre.dns.netbox
10:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T391056)', diff saved to https://phabricator.wikimedia.org/P75423 and previous config saved to /var/cache/conftool/dbconfig/20250424-101710-fceratto.json
10:04 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
10:02 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T391056)', diff saved to https://phabricator.wikimedia.org/P75422 and previous config saved to /var/cache/conftool/dbconfig/20250424-100206-fceratto.json
10:01 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2188.codfw.wmnet with reason: Maintenance
10:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T391056)', diff saved to https://phabricator.wikimedia.org/P75421 and previous config saved to /var/cache/conftool/dbconfig/20250424-100143-fceratto.json
09:59 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
09:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P75420 and previous config saved to /var/cache/conftool/dbconfig/20250424-094635-fceratto.json
09:41 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
09:33 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
09:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P75419 and previous config saved to /var/cache/conftool/dbconfig/20250424-093128-fceratto.json
09:27 Emperor: depool thanos-fe200[1-3] pending decommissioning T391352
09:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T391056)', diff saved to https://phabricator.wikimedia.org/P75418 and previous config saved to /var/cache/conftool/dbconfig/20250424-091622-fceratto.json
08:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T391056)', diff saved to https://phabricator.wikimedia.org/P75417 and previous config saved to /var/cache/conftool/dbconfig/20250424-085933-fceratto.json
08:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2176.codfw.wmnet with reason: Maintenance
08:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T391056)', diff saved to https://phabricator.wikimedia.org/P75416 and previous config saved to /var/cache/conftool/dbconfig/20250424-085911-fceratto.json
08:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
08:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P75415 and previous config saved to /var/cache/conftool/dbconfig/20250424-084404-fceratto.json
08:40 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P75414 and previous config saved to /var/cache/conftool/dbconfig/20250424-084004-root.json
08:39 moritzm: installing reprepro bugfix updates from Bookworm point release
08:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P75413 and previous config saved to /var/cache/conftool/dbconfig/20250424-082857-fceratto.json
08:24 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P75412 and previous config saved to /var/cache/conftool/dbconfig/20250424-082458-root.json
08:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T391056)', diff saved to https://phabricator.wikimedia.org/P75411 and previous config saved to /var/cache/conftool/dbconfig/20250424-081350-fceratto.json
08:09 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P75410 and previous config saved to /var/cache/conftool/dbconfig/20250424-080953-root.json
07:55 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T391056)', diff saved to https://phabricator.wikimedia.org/P75409 and previous config saved to /var/cache/conftool/dbconfig/20250424-075547-fceratto.json
07:55 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2174.codfw.wmnet with reason: Maintenance
07:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T391056)', diff saved to https://phabricator.wikimedia.org/P75408 and previous config saved to /var/cache/conftool/dbconfig/20250424-075524-fceratto.json
07:54 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P75407 and previous config saved to /var/cache/conftool/dbconfig/20250424-075448-root.json
07:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P75405 and previous config saved to /var/cache/conftool/dbconfig/20250424-074016-fceratto.json
07:39 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P75404 and previous config saved to /var/cache/conftool/dbconfig/20250424-073943-root.json
07:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P75403 and previous config saved to /var/cache/conftool/dbconfig/20250424-072508-fceratto.json
07:24 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P75402 and previous config saved to /var/cache/conftool/dbconfig/20250424-072439-root.json
07:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T391056)', diff saved to https://phabricator.wikimedia.org/P75401 and previous config saved to /var/cache/conftool/dbconfig/20250424-071001-fceratto.json
07:09 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P75400 and previous config saved to /var/cache/conftool/dbconfig/20250424-070933-root.json
06:54 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P75399 and previous config saved to /var/cache/conftool/dbconfig/20250424-065428-root.json
06:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
06:52 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T391056)', diff saved to https://phabricator.wikimedia.org/P75398 and previous config saved to /var/cache/conftool/dbconfig/20250424-065227-fceratto.json
06:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
06:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2186.codfw.wmnet with reason: Maintenance
06:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2173.codfw.wmnet with reason: Maintenance
06:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T391056)', diff saved to https://phabricator.wikimedia.org/P75397 and previous config saved to /var/cache/conftool/dbconfig/20250424-065149-fceratto.json
06:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1257.eqiad.wmnet with reason: Maintenance
06:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1256.eqiad.wmnet with reason: Maintenance
06:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1255.eqiad.wmnet with reason: Maintenance
06:39 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P75396 and previous config saved to /var/cache/conftool/dbconfig/20250424-063922-root.json
06:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P75395 and previous config saved to /var/cache/conftool/dbconfig/20250424-063643-fceratto.json
06:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1027.eqiad.wmnet with reason: Maintenance
06:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1027 T391921', diff saved to https://phabricator.wikimedia.org/P75394 and previous config saved to /var/cache/conftool/dbconfig/20250424-063345-marostegui.json
06:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P75393 and previous config saved to /var/cache/conftool/dbconfig/20250424-062135-fceratto.json
06:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T391056)', diff saved to https://phabricator.wikimedia.org/P75392 and previous config saved to /var/cache/conftool/dbconfig/20250424-060628-fceratto.json
05:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T391056)', diff saved to https://phabricator.wikimedia.org/P75391 and previous config saved to /var/cache/conftool/dbconfig/20250424-054831-fceratto.json
05:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2170.codfw.wmnet with reason: Maintenance
05:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T391056)', diff saved to https://phabricator.wikimedia.org/P75390 and previous config saved to /var/cache/conftool/dbconfig/20250424-054808-fceratto.json
05:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P75389 and previous config saved to /var/cache/conftool/dbconfig/20250424-053301-fceratto.json
05:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P75388 and previous config saved to /var/cache/conftool/dbconfig/20250424-051753-fceratto.json
05:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T391056)', diff saved to https://phabricator.wikimedia.org/P75387 and previous config saved to /var/cache/conftool/dbconfig/20250424-050247-fceratto.json
04:41 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T391056)', diff saved to https://phabricator.wikimedia.org/P75386 and previous config saved to /var/cache/conftool/dbconfig/20250424-044153-fceratto.json
04:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2153.codfw.wmnet with reason: Maintenance
04:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T391056)', diff saved to https://phabricator.wikimedia.org/P75385 and previous config saved to /var/cache/conftool/dbconfig/20250424-044130-fceratto.json
04:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P75384 and previous config saved to /var/cache/conftool/dbconfig/20250424-042623-fceratto.json
04:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P75383 and previous config saved to /var/cache/conftool/dbconfig/20250424-041116-fceratto.json
03:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T391056)', diff saved to https://phabricator.wikimedia.org/P75382 and previous config saved to /var/cache/conftool/dbconfig/20250424-035609-fceratto.json
03:37 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T391056)', diff saved to https://phabricator.wikimedia.org/P75381 and previous config saved to /var/cache/conftool/dbconfig/20250424-033724-fceratto.json
03:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2146.codfw.wmnet with reason: Maintenance
03:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T391056)', diff saved to https://phabricator.wikimedia.org/P75380 and previous config saved to /var/cache/conftool/dbconfig/20250424-033701-fceratto.json
03:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P75379 and previous config saved to /var/cache/conftool/dbconfig/20250424-032154-fceratto.json
03:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P75378 and previous config saved to /var/cache/conftool/dbconfig/20250424-030647-fceratto.json
02:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T391056)', diff saved to https://phabricator.wikimedia.org/P75377 and previous config saved to /var/cache/conftool/dbconfig/20250424-025140-fceratto.json
02:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T391056)', diff saved to https://phabricator.wikimedia.org/P75376 and previous config saved to /var/cache/conftool/dbconfig/20250424-023220-fceratto.json
02:32 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2145.codfw.wmnet with reason: Maintenance
02:17 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2141.codfw.wmnet with reason: Maintenance
02:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
02:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T391056)', diff saved to https://phabricator.wikimedia.org/P75375 and previous config saved to /var/cache/conftool/dbconfig/20250424-020328-fceratto.json
01:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P75374 and previous config saved to /var/cache/conftool/dbconfig/20250424-014821-fceratto.json
01:40 pt1979@cumin2002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e3-codfw
01:40 pt1979@cumin2002: START - Cookbook sre.network.tls for network device lsw1-e3-codfw
01:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2049.codfw.wmnet with OS bookworm
01:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:33 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-e3-codfw.mgmt.codfw.wmnet
01:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P75373 and previous config saved to /var/cache/conftool/dbconfig/20250424-013313-fceratto.json
01:28 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2045.codfw.wmnet with OS bookworm
01:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2049.codfw.wmnet with reason: host reimage
01:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T391056)', diff saved to https://phabricator.wikimedia.org/P75372 and previous config saved to /var/cache/conftool/dbconfig/20250424-011807-fceratto.json
01:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2049.codfw.wmnet with reason: host reimage
01:10 eileen: config revision changed from bfbce54f to 47a5d384
01:02 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1251 (T391056)', diff saved to https://phabricator.wikimedia.org/P75371 and previous config saved to /var/cache/conftool/dbconfig/20250424-010217-fceratto.json
01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-e3-codfw - pt1979@cumin2002"
01:02 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1251.eqiad.wmnet with reason: Maintenance
01:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2049.codfw.wmnet with OS bookworm
01:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2048.codfw.wmnet with OS bookworm
01:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2047.codfw.wmnet with OS bookworm
01:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2046.codfw.wmnet with OS bookworm
01:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2045.codfw.wmnet with OS bookworm
00:59 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-e3-codfw - pt1979@cumin2002"
00:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:54 pt1979@cumin2002: START - Cookbook sre.dns.netbox
00:54 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
00:54 pt1979@cumin2002: START - Cookbook sre.dns.netbox
00:54 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-e3-codfw.mgmt.codfw.wmnet
00:53 pt1979@cumin2002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e1-codfw
00:53 pt1979@cumin2002: START - Cookbook sre.network.tls for network device lsw1-e1-codfw
00:52 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-e1-codfw.mgmt.codfw.wmnet
00:46 eileen: civicrm upgraded from b3038510 to c8946ea5
00:45 eileen: config revision changed from c635ed3c to bfbce54f
00:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:44 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:44 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1240.eqiad.wmnet with reason: Maintenance
00:43 pt1979@cumin2002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-codfw
00:43 pt1979@cumin2002: START - Cookbook sre.network.tls for network device ssw1-e1-codfw
00:32 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-e1-codfw.mgmt.codfw.wmnet
00:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1239.eqiad.wmnet with reason: Maintenance
00:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T391056)', diff saved to https://phabricator.wikimedia.org/P75370 and previous config saved to /var/cache/conftool/dbconfig/20250424-003043-fceratto.json
00:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-e1-codfw - pt1979@cumin2002"
00:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-e1-codfw - pt1979@cumin2002"
00:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
00:17 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-e1-codfw.mgmt.codfw.wmnet
00:17 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-e1-codfw.mgmt.codfw.wmnet
00:17 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:17 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for lsw1-e1-codfw - pt1979@cumin2002"
00:17 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for lsw1-e1-codfw - pt1979@cumin2002"
00:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P75369 and previous config saved to /var/cache/conftool/dbconfig/20250424-001535-fceratto.json
00:12 pt1979@cumin2002: START - Cookbook sre.dns.netbox
00:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P75368 and previous config saved to /var/cache/conftool/dbconfig/20250424-000028-fceratto.json

2025-04-23

23:59 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:56 pt1979@cumin2002: START - Cookbook sre.dns.netbox
23:56 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-e1-codfw.mgmt.codfw.wmnet
23:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T391056)', diff saved to https://phabricator.wikimedia.org/P75367 and previous config saved to /var/cache/conftool/dbconfig/20250423-234521-fceratto.json
23:41 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:41 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-e1-codfw - pt1979@cumin2002"
23:41 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-e1-codfw - pt1979@cumin2002"
23:37 pt1979@cumin2002: START - Cookbook sre.dns.netbox
23:37 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-e1-codfw.mgmt.codfw.wmnet
23:36 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-e1-codfw.mgmt.codfw.wmnet
22:58 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:58 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-e1-codfw - pt1979@cumin2002"
22:58 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-e1-codfw - pt1979@cumin2002"
22:54 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: security release
22:53 pt1979@cumin2002: START - Cookbook sre.dns.netbox
22:53 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-e1-codfw.mgmt.codfw.wmnet
22:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-e1-codfw.mgmt.codfw.wmnet
22:52 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:52 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-e1-codfw - pt1979@cumin2002"
22:52 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-e1-codfw - pt1979@cumin2002"
22:46 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2061.codfw.wmnet with OS bullseye
22:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox
22:34 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T391056)', diff saved to https://phabricator.wikimedia.org/P75366 and previous config saved to /var/cache/conftool/dbconfig/20250423-223359-fceratto.json
22:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1235.eqiad.wmnet with reason: Maintenance
22:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T391056)', diff saved to https://phabricator.wikimedia.org/P75365 and previous config saved to /var/cache/conftool/dbconfig/20250423-223336-fceratto.json
22:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2061.codfw.wmnet with reason: host reimage
22:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P75364 and previous config saved to /var/cache/conftool/dbconfig/20250423-221828-fceratto.json
22:18 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2061.codfw.wmnet with reason: host reimage
22:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2096.codfw.wmnet with OS bullseye
22:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P75363 and previous config saved to /var/cache/conftool/dbconfig/20250423-220321-fceratto.json
22:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2061
22:03 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2061
22:02 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2061
22:02 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2061.codfw.wmnet 143.0.192.10.in-addr.arpa 3.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
22:02 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2061.codfw.wmnet 143.0.192.10.in-addr.arpa 3.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
22:02 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:02 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2061 - bking@cumin2002"
22:02 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2061 - bking@cumin2002"
21:58 bking@cumin2002: START - Cookbook sre.dns.netbox
21:57 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2061
21:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2061.codfw.wmnet with OS bullseye
21:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2061 to cirrussearch2061
21:55 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2061
21:54 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2061
21:54 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:54 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2061 to cirrussearch2061 - bking@cumin2002"
21:54 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2061 to cirrussearch2061 - bking@cumin2002"
21:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2096.codfw.wmnet with reason: host reimage
21:50 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2096.codfw.wmnet with reason: host reimage
21:50 bking@cumin2002: START - Cookbook sre.dns.netbox
21:49 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2061 to cirrussearch2061
21:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T391056)', diff saved to https://phabricator.wikimedia.org/P75362 and previous config saved to /var/cache/conftool/dbconfig/20250423-214814-fceratto.json
21:37 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=eqiad,name=restbase1043.eqiad.wmnet
21:36 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=eqiad,name=restbase1028.eqiad.wmnet
21:32 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2096
21:32 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2096
21:32 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2096
21:32 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2096.codfw.wmnet 233.16.192.10.in-addr.arpa 3.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
21:32 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2096.codfw.wmnet 233.16.192.10.in-addr.arpa 3.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
21:32 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:32 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2096 - bking@cumin2002"
21:32 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2096 - bking@cumin2002"
21:28 bking@cumin2002: START - Cookbook sre.dns.netbox
21:28 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T391056)', diff saved to https://phabricator.wikimedia.org/P75361 and previous config saved to /var/cache/conftool/dbconfig/20250423-212818-fceratto.json
21:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1234.eqiad.wmnet with reason: Maintenance
21:28 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2096
21:28 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2096.codfw.wmnet with OS bullseye
21:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T391056)', diff saved to https://phabricator.wikimedia.org/P75360 and previous config saved to /var/cache/conftool/dbconfig/20250423-212756-fceratto.json
21:15 jforrester@deploy1003: Finished scap sync-world: Backport for [wikifunctionswiki] Enable Parsoid in wikitext articles, tests: Add a Wikifunctions-related test suite (duration: 11m 33s)
21:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P75359 and previous config saved to /var/cache/conftool/dbconfig/20250423-211249-fceratto.json
20:57 dancy@deploy1003: Installation of scap version "4.155.0" completed for 186 hosts
20:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P75358 and previous config saved to /var/cache/conftool/dbconfig/20250423-205743-fceratto.json
20:53 dancy@deploy1003: Installing scap version "4.155.0" for 186 host(s)
20:49 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2096']
20:46 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cirrussearch2096']
20:45 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on aphlict2001.codfw.wmnet with reason: Bookworm Re-image
20:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T391056)', diff saved to https://phabricator.wikimedia.org/P75357 and previous config saved to /var/cache/conftool/dbconfig/20250423-204236-fceratto.json
20:41 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2096']
20:38 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cirrussearch2096']
20:33 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2096']
20:22 cscott@deploy1003: Finished scap sync-world: Backport for Turn on ParsoidFragmentInput; remove unneeded ParsoidFragmentSupport config (T268144) (duration: 15m 19s)
20:22 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@4a7644d]: Deploy hotfix for T391283. (duration: 01m 04s)
20:21 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@4a7644d]: Deploy hotfix for T391283.
20:15 cscott@deploy1003: cscott: Continuing with sync
20:12 cscott@deploy1003: cscott: Backport for Turn on ParsoidFragmentInput; remove unneeded ParsoidFragmentSupport config (T268144) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:08 brett: import libvmod-netmapper-1.10-1 into bullseye-wikimedia (T392533)
20:07 cscott@deploy1003: Started scap sync-world: Backport for Turn on ParsoidFragmentInput; remove unneeded ParsoidFragmentSupport config (T268144)
20:04 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T391056)', diff saved to https://phabricator.wikimedia.org/P75356 and previous config saved to /var/cache/conftool/dbconfig/20250423-200358-fceratto.json
20:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1232.eqiad.wmnet with reason: Maintenance
20:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T391056)', diff saved to https://phabricator.wikimedia.org/P75355 and previous config saved to /var/cache/conftool/dbconfig/20250423-200346-fceratto.json
19:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P75354 and previous config saved to /var/cache/conftool/dbconfig/20250423-194839-fceratto.json
19:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P75353 and previous config saved to /var/cache/conftool/dbconfig/20250423-193332-fceratto.json
19:31 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@7312379]: Release DAGs for T391283. (duration: 00m 54s)
19:30 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@7312379]: Release DAGs for T391283.
19:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T391056)', diff saved to https://phabricator.wikimedia.org/P75352 and previous config saved to /var/cache/conftool/dbconfig/20250423-191825-fceratto.json
19:09 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release
19:01 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T391056)', diff saved to https://phabricator.wikimedia.org/P75351 and previous config saved to /var/cache/conftool/dbconfig/20250423-190116-fceratto.json
19:01 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1219.eqiad.wmnet with reason: Maintenance
19:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T391056)', diff saved to https://phabricator.wikimedia.org/P75350 and previous config saved to /var/cache/conftool/dbconfig/20250423-190054-fceratto.json
18:59 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:59 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-e1-codfw - pt1979@cumin2002"
18:59 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-e1-codfw - pt1979@cumin2002"
18:56 brett: import libvmod-wmfuniq-0.1.0~deb12u1 and wmfuniq-keygen-0.1.0~deb12u1 into bookworm-wikimedia (T392059)
18:56 brett: import libvmod-wmfuniq-0.1.0~deb11u1 and wmfuniq-keygen-0.1.0~deb11u1 into bullseye-wikimedia (T392059)
18:51 dzahn@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: security release
18:51 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release
18:50 pt1979@cumin2002: START - Cookbook sre.dns.netbox
18:50 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-e1-codfw.mgmt.codfw.wmnet
18:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P75349 and previous config saved to /var/cache/conftool/dbconfig/20250423-184547-fceratto.json
18:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-e1-codfw.mgmt.codfw.wmnet
18:43 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-e1-codfw.mgmt.codfw.wmnet
18:43 brett: remove libvmod-wmfuniq-0.1.0 and wmfuniq-keygen-0.1.0 from bullseye-wikimedia (T392059)
18:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P75348 and previous config saved to /var/cache/conftool/dbconfig/20250423-183040-fceratto.json
18:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T391056)', diff saved to https://phabricator.wikimedia.org/P75347 and previous config saved to /var/cache/conftool/dbconfig/20250423-181533-fceratto.json
18:09 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2113.codfw.wmnet with OS bullseye
18:04 brett: import libvmod-wmfuniq 0.1.0 into bullseye-wikimedia (T392059)
17:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T391056)', diff saved to https://phabricator.wikimedia.org/P75346 and previous config saved to /var/cache/conftool/dbconfig/20250423-175948-fceratto.json
17:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1218.eqiad.wmnet with reason: Maintenance
17:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T391056)', diff saved to https://phabricator.wikimedia.org/P75345 and previous config saved to /var/cache/conftool/dbconfig/20250423-175926-fceratto.json
17:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P75344 and previous config saved to /var/cache/conftool/dbconfig/20250423-174419-fceratto.json
17:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:39 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch2071.codfw.wmnet|cirrussearch2098.codfw.wmnet|cirrussearch2099.codfw.wmnet|cirrussearch2101.codfw.wmnet|cirrussearch2102.codfw.wmnet|cirrussearch2113.codfw.wmnet
17:38 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P75343 and previous config saved to /var/cache/conftool/dbconfig/20250423-172912-fceratto.json
17:21 brett: Remove libvarnishapi-dev from bookworm-wikimedia
17:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T391056)', diff saved to https://phabricator.wikimedia.org/P75342 and previous config saved to /var/cache/conftool/dbconfig/20250423-171404-fceratto.json
17:04 brett: Remove varnish libvmod-re2 libvmod-netmapper libvmod-querysort libvarnishapi2 varnish-modules varnishkafka from bookworm-wikimedia
16:56 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T391056)', diff saved to https://phabricator.wikimedia.org/P75341 and previous config saved to /var/cache/conftool/dbconfig/20250423-165634-fceratto.json
16:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1207.eqiad.wmnet with reason: Maintenance
16:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T391056)', diff saved to https://phabricator.wikimedia.org/P75340 and previous config saved to /var/cache/conftool/dbconfig/20250423-165611-fceratto.json
16:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2113
16:52 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2113
16:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2113.codfw.wmnet with OS bullseye
16:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2113 to cirrussearch2113
16:50 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2113
16:50 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2113
16:50 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:50 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2113 to cirrussearch2113 - bking@cumin2002"
16:47 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2113 to cirrussearch2113 - bking@cumin2002"
16:43 bking@cumin2002: START - Cookbook sre.dns.netbox
16:42 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2113 to cirrussearch2113
16:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P75338 and previous config saved to /var/cache/conftool/dbconfig/20250423-164105-fceratto.json
16:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2102.codfw.wmnet with OS bullseye
16:28 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:28 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:27 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Enable temporary-account-viewer group on all WMF production wikis (T390942 T387205) (duration: 11m 11s)
16:27 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P75337 and previous config saved to /var/cache/conftool/dbconfig/20250423-162558-fceratto.json
16:25 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
16:24 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P75336 and previous config saved to /var/cache/conftool/dbconfig/20250423-162434-root.json
16:24 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
16:23 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1184
16:23 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1184
16:21 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:21 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
16:21 dreamyjazz@deploy1003: dreamyjazz: Backport for Enable temporary-account-viewer group on all WMF production wikis (T390942 T387205) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:19 vriley@cumin1002: START - Cookbook sre.dns.netbox
16:18 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
16:16 dreamyjazz@deploy1003: Started scap sync-world: Backport for Enable temporary-account-viewer group on all WMF production wikis (T390942 T387205)
16:13 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
16:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T391056)', diff saved to https://phabricator.wikimedia.org/P75335 and previous config saved to /var/cache/conftool/dbconfig/20250423-161051-fceratto.json
16:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2101.codfw.wmnet with OS bullseye
16:10 dreamyjazz@deploy1003: dreamyjazz: Backport for Enable temporary-account-viewer group on all WMF production wikis (T390942 T387205) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2102.codfw.wmnet with reason: host reimage
16:09 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
16:09 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P75334 and previous config saved to /var/cache/conftool/dbconfig/20250423-160928-root.json
16:08 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
16:07 dreamyjazz@deploy1003: Started scap sync-world: Backport for Enable temporary-account-viewer group on all WMF production wikis (T390942 T387205)
16:06 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2102.codfw.wmnet with reason: host reimage
16:04 vgutierrez: restarting pybal on lvs201[34]
16:01 dancy@deploy1003: Installation of scap version "4.154.0" completed for 2 hosts
16:00 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
16:00 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
16:00 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
16:00 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
16:00 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
15:59 dancy@deploy1003: Installing scap version "4.154.0" for 2 host(s)
15:58 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
15:54 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P75333 and previous config saved to /var/cache/conftool/dbconfig/20250423-155423-root.json
15:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T391056)', diff saved to https://phabricator.wikimedia.org/P75332 and previous config saved to /var/cache/conftool/dbconfig/20250423-155423-fceratto.json
15:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1206.eqiad.wmnet with reason: Maintenance
15:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T391056)', diff saved to https://phabricator.wikimedia.org/P75331 and previous config saved to /var/cache/conftool/dbconfig/20250423-155401-fceratto.json
15:52 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
15:50 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2102
15:50 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2102
15:48 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2102
15:48 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2102.codfw.wmnet 221.32.192.10.in-addr.arpa 1.2.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:48 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2102.codfw.wmnet 221.32.192.10.in-addr.arpa 1.2.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:48 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:48 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2102 - bking@cumin2002"
15:48 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2102 - bking@cumin2002"
15:47 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2101.codfw.wmnet with reason: host reimage
15:44 bking@cumin2002: START - Cookbook sre.dns.netbox
15:43 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2102
15:43 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2102.codfw.wmnet with OS bullseye
15:43 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2101.codfw.wmnet with reason: host reimage
15:41 ladsgroup@dns1004: END - running authdns-update
15:39 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P75327 and previous config saved to /var/cache/conftool/dbconfig/20250423-153918-root.json
15:39 ladsgroup@dns1004: START - running authdns-update
15:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P75326 and previous config saved to /var/cache/conftool/dbconfig/20250423-153854-fceratto.json
15:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2102 to cirrussearch2102
15:29 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2102
15:28 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2102
15:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:28 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2102 to cirrussearch2102 - bking@cumin2002"
15:28 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2102 to cirrussearch2102 - bking@cumin2002"
15:26 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2101
15:26 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2101
15:25 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2101
15:25 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2101.codfw.wmnet 220.32.192.10.in-addr.arpa 0.2.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:25 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2101.codfw.wmnet 220.32.192.10.in-addr.arpa 0.2.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:25 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:25 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2101 - bking@cumin2002"
15:25 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2101 - bking@cumin2002"
15:24 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P75324 and previous config saved to /var/cache/conftool/dbconfig/20250423-152412-root.json
15:24 bking@cumin2002: START - Cookbook sre.dns.netbox
15:23 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2102 to cirrussearch2102
15:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P75323 and previous config saved to /var/cache/conftool/dbconfig/20250423-152347-fceratto.json
15:09 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwdebug1001.eqiad.wmnet with OS bullseye
15:09 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P75322 and previous config saved to /var/cache/conftool/dbconfig/20250423-150907-root.json
15:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T391056)', diff saved to https://phabricator.wikimedia.org/P75321 and previous config saved to /var/cache/conftool/dbconfig/20250423-150839-fceratto.json
14:55 bking@cumin2002: START - Cookbook sre.dns.netbox
14:54 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2101
14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2101.codfw.wmnet with OS bullseye
14:54 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2101.codfw.wmnet on all recursors
14:54 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2101.codfw.wmnet on all recursors
14:54 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P75320 and previous config saved to /var/cache/conftool/dbconfig/20250423-145401-root.json
14:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2101 to cirrussearch2101
14:52 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2101
14:51 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2101
14:51 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:51 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2101 to cirrussearch2101 - bking@cumin2002"
14:48 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Remove wgCheckUserCentralIndexRangesToExclude definition (T389055) (duration: 11m 00s)
14:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T391056)', diff saved to https://phabricator.wikimedia.org/P75319 and previous config saved to /var/cache/conftool/dbconfig/20250423-144811-fceratto.json
14:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
14:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1196.eqiad.wmnet with reason: Maintenance
14:47 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2101 to cirrussearch2101 - bking@cumin2002"
14:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T391056)', diff saved to https://phabricator.wikimedia.org/P75318 and previous config saved to /var/cache/conftool/dbconfig/20250423-144741-fceratto.json
14:42 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
14:42 dreamyjazz@deploy1003: dreamyjazz: Backport for Remove wgCheckUserCentralIndexRangesToExclude definition (T389055) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:40 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2099.codfw.wmnet with OS bullseye
14:38 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P75317 and previous config saved to /var/cache/conftool/dbconfig/20250423-143856-root.json
14:37 dreamyjazz@deploy1003: Started scap sync-world: Backport for Remove wgCheckUserCentralIndexRangesToExclude definition (T389055)
14:37 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwdebug1001.eqiad.wmnet with reason: host reimage
14:33 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mwdebug1001.eqiad.wmnet with reason: host reimage
14:33 bking@cumin2002: START - Cookbook sre.dns.netbox
14:32 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2101 to cirrussearch2101
14:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P75316 and previous config saved to /var/cache/conftool/dbconfig/20250423-143235-fceratto.json
14:30 jforrester@deploy1003: Finished scap sync-world: Backport for ZString: Don't explode if we're handed an array with odd contents (T392370), API: Don't try to read fetchAllZLanguageCodes() in client-mode Action APIs either (T392014) (duration: 11m 29s)
14:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2098.codfw.wmnet with OS bullseye
14:23 jforrester@deploy1003: jforrester: Continuing with sync
14:23 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P75315 and previous config saved to /var/cache/conftool/dbconfig/20250423-142350-root.json
14:23 jforrester@deploy1003: jforrester: Backport for ZString: Don't explode if we're handed an array with odd contents (T392370), API: Don't try to read fetchAllZLanguageCodes() in client-mode Action APIs either (T392014) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2099.codfw.wmnet with reason: host reimage
14:19 jforrester@deploy1003: Started scap sync-world: Backport for ZString: Don't explode if we're handed an array with odd contents (T392370), API: Don't try to read fetchAllZLanguageCodes() in client-mode Action APIs either (T392014)
14:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P75314 and previous config saved to /var/cache/conftool/dbconfig/20250423-141728-fceratto.json
14:15 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2099.codfw.wmnet with reason: host reimage
14:14 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mwdebug1001.eqiad.wmnet with OS bullseye
14:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1032.eqiad.wmnet with reason: Maintenance
14:12 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:12 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:12 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:11 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:10 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:10 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1032', diff saved to https://phabricator.wikimedia.org/P75313 and previous config saved to /var/cache/conftool/dbconfig/20250423-141000-marostegui.json
14:07 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:07 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:06 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:06 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:05 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:03 samtar@deploy1003: Finished scap sync-world: Backport for Remove temporary '-php8' and '-k8s' suffixes from ArcLamp pipeline (T391516) (duration: 13m 36s)
14:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T391056)', diff saved to https://phabricator.wikimedia.org/P75312 and previous config saved to /var/cache/conftool/dbconfig/20250423-140221-fceratto.json
13:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2099
13:59 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2099
13:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2099
13:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2099.codfw.wmnet 218.32.192.10.in-addr.arpa 8.1.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
13:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2099.codfw.wmnet 218.32.192.10.in-addr.arpa 8.1.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
13:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2099 - bking@cumin2002"
13:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2099 - bking@cumin2002"
13:57 jiji@cumin1002: conftool action : set/pooled=inactive; selector: name=mwdebug1001.eqiad.wmnet
13:56 samtar@deploy1003: ori, samtar: Continuing with sync
13:56 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
13:56 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2098.codfw.wmnet with reason: host reimage
13:54 bking@cumin2002: START - Cookbook sre.dns.netbox
13:54 samtar@deploy1003: ori, samtar: Backport for Remove temporary '-php8' and '-k8s' suffixes from ArcLamp pipeline (T391516) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2099
13:53 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2099.codfw.wmnet with OS bullseye
13:53 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2098.codfw.wmnet with reason: host reimage
13:53 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2099.codfw.wmnet on all recursors
13:53 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2099.codfw.wmnet on all recursors
13:49 samtar@deploy1003: Started scap sync-world: Backport for Remove temporary '-php8' and '-k8s' suffixes from ArcLamp pipeline (T391516)
13:47 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
13:44 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2099 to cirrussearch2099
13:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2099
13:43 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2099
13:43 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:43 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2099 to cirrussearch2099 - bking@cumin2002"
13:43 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2099 to cirrussearch2099 - bking@cumin2002"
13:41 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T391056)', diff saved to https://phabricator.wikimedia.org/P75310 and previous config saved to /var/cache/conftool/dbconfig/20250423-134142-fceratto.json
13:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1195.eqiad.wmnet with reason: Maintenance
13:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T391056)', diff saved to https://phabricator.wikimedia.org/P75309 and previous config saved to /var/cache/conftool/dbconfig/20250423-134131-fceratto.json
13:39 bking@cumin2002: START - Cookbook sre.dns.netbox
13:38 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2099 to cirrussearch2099
13:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2098
13:36 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2098
13:36 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2098
13:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2098.codfw.wmnet 217.32.192.10.in-addr.arpa 7.1.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
13:36 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2098.codfw.wmnet 217.32.192.10.in-addr.arpa 7.1.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
13:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:36 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2098 - bking@cumin2002"
13:36 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2098 - bking@cumin2002"
13:34 tgr_: T392462 Ran fixStuckGlobalRename.php for two users
13:31 bking@cumin2002: START - Cookbook sre.dns.netbox
13:31 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2098
13:31 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2098.codfw.wmnet with OS bullseye
13:31 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2098.codfw.wmnet on all recursors
13:31 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2098.codfw.wmnet on all recursors
13:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2098 to cirrussearch2098
13:29 kamila@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
13:29 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
13:29 kamila@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
13:29 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2098
13:28 kamila@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
13:28 samtar@deploy1003: Finished scap sync-world: Backport for Add throttle exemptions for some Edit-a-thons (T391764 T391999) (duration: 11m 42s)
13:28 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2098
13:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:28 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2098 to cirrussearch2098 - bking@cumin2002"
13:26 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2098 to cirrussearch2098 - bking@cumin2002"
13:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P75308 and previous config saved to /var/cache/conftool/dbconfig/20250423-132624-fceratto.json
13:23 moritzm: installing Linux 6.1.133 on Bookworm hosts
13:22 bking@cumin2002: START - Cookbook sre.dns.netbox
13:22 samtar@deploy1003: superpes, samtar: Continuing with sync
13:22 samtar@deploy1003: superpes, samtar: Backport for Add throttle exemptions for some Edit-a-thons (T391764 T391999) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:21 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2098 to cirrussearch2098
13:18 TheresNoTime: samtar@deploy1003 Finished scap sync-world: Backport for SUL3: Remove unused CentralAuthSharedDomainPrefix config setting, Simplify CentralAuthEnableSul3 config setting value (duration: 11m 28s)
13:05 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1003.eqiad.wmnet
13:04 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
13:03 samtar@deploy1003: Started scap sync-world: Backport for SUL3: Remove unused CentralAuthSharedDomainPrefix config setting, Simplify CentralAuthEnableSul3 config setting value
13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1002.eqiad.wmnet
13:02 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-eqiad
13:01 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
12:57 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
12:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T391056)', diff saved to https://phabricator.wikimedia.org/P75306 and previous config saved to /var/cache/conftool/dbconfig/20250423-125611-fceratto.json
12:54 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-codfw
12:44 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
12:44 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
12:42 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1004.eqiad.wmnet
12:41 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all
12:36 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T391056)', diff saved to https://phabricator.wikimedia.org/P75305 and previous config saved to /var/cache/conftool/dbconfig/20250423-123640-fceratto.json
12:36 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1186.eqiad.wmnet with reason: Maintenance
12:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T391056)', diff saved to https://phabricator.wikimedia.org/P75304 and previous config saved to /var/cache/conftool/dbconfig/20250423-123617-fceratto.json
12:36 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:35 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:35 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1004.eqiad.wmnet
12:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc7 T391454', diff saved to https://phabricator.wikimedia.org/P75303 and previous config saved to /var/cache/conftool/dbconfig/20250423-122924-marostegui.json
12:27 hashar: gerrit: removed obsolete 1024px-Sea_and_sky_light.cache.jpg file from all servers. File was replaced by 2006-12-28_10h26_33.jpg # T392479
12:26 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
12:21 cmooney@dns2005: END - running authdns-update
12:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P75302 and previous config saved to /var/cache/conftool/dbconfig/20250423-122110-fceratto.json
12:19 cmooney@dns2005: START - running authdns-update
12:19 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
12:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2017.codfw.wmnet,pc1017.eqiad.wmnet with reason: Maintenance
12:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc7 T391454', diff saved to https://phabricator.wikimedia.org/P75301 and previous config saved to /var/cache/conftool/dbconfig/20250423-121722-marostegui.json
12:17 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
12:10 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
12:10 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
12:09 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
12:09 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
12:08 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
12:08 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
12:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P75298 and previous config saved to /var/cache/conftool/dbconfig/20250423-120602-fceratto.json
11:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T391056)', diff saved to https://phabricator.wikimedia.org/P75297 and previous config saved to /var/cache/conftool/dbconfig/20250423-115054-fceratto.json
11:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T391056)', diff saved to https://phabricator.wikimedia.org/P75296 and previous config saved to /var/cache/conftool/dbconfig/20250423-113200-fceratto.json
11:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1184.eqiad.wmnet with reason: Maintenance
11:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T391056)', diff saved to https://phabricator.wikimedia.org/P75295 and previous config saved to /var/cache/conftool/dbconfig/20250423-113148-fceratto.json
11:27 moritzm: installing libxml2 security updates
11:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P75294 and previous config saved to /var/cache/conftool/dbconfig/20250423-111641-fceratto.json
11:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P75293 and previous config saved to /var/cache/conftool/dbconfig/20250423-110134-fceratto.json
10:53 moritzm: installing php8.2 security updates
10:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T391056)', diff saved to https://phabricator.wikimedia.org/P75292 and previous config saved to /var/cache/conftool/dbconfig/20250423-104627-fceratto.json
10:42 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
10:41 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
10:41 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
10:41 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
10:41 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
10:40 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
10:40 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
10:40 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
10:39 hnowlan: migrating various minor mobileapps/PCS APIs to serve via the rest-gateway instead of restbase
10:27 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T391056)', diff saved to https://phabricator.wikimedia.org/P75291 and previous config saved to /var/cache/conftool/dbconfig/20250423-102752-fceratto.json
10:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1169.eqiad.wmnet with reason: Maintenance
10:06 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1004.eqiad.wmnet
09:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 3.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.3.0.e.f.0.0.0.a.0.8.c.e.2.0.a.2.ip6.arpa on all recursors
09:57 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache 3.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.3.0.e.f.0.0.0.a.0.8.c.e.2.0.a.2.ip6.arpa on all recursors
09:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.3.0.e.f.0.0.0.a.0.8.c.e.2.0.a.2.ip6.arpa on all recursors
09:57 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache 2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.3.0.e.f.0.0.0.a.0.8.c.e.2.0.a.2.ip6.arpa on all recursors
09:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:56 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: correct dns record for cloudgw vip eqiad - cmooney@cumin1002"
09:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: correct dns record for cloudgw vip eqiad - cmooney@cumin1002"
09:52 cmooney@cumin1002: START - Cookbook sre.dns.netbox
09:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
09:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox
09:29 cmooney@cumin1002: START - Cookbook sre.dns.netbox
09:10 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1004.eqiad.wmnet
09:04 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1004.eqiad.wmnet
08:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2001.codfw.wmnet
08:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2001.codfw.wmnet
08:18 moritzm: installing openjpeg2 security updates
08:02 taavi@deploy1003: Finished scap sync-world: Backport for Add WMCS v6 range to relevant exclusions (T386689) (duration: 11m 58s)
07:56 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
07:56 taavi@deploy1003: taavi: Continuing with sync
07:55 taavi@deploy1003: taavi: Backport for Add WMCS v6 range to relevant exclusions (T386689) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:52 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
07:51 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
07:50 taavi@deploy1003: Started scap sync-world: Backport for Add WMCS v6 range to relevant exclusions (T386689)
07:46 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
07:41 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
07:36 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
07:33 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
07:28 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
07:28 elukey: reboot ml-serve-ctrl* VMs to pick up new cpu/memory settings - T392289
07:27 elukey: elukey@ganeti1048:~$ sudo gnt-instance modify -B memory=6g,vcpus=4 ml-serve-ctrl1001.eqiad.wmnet - T392289
07:27 elukey: elukey@ganeti1048:~$ sudo gnt-instance modify -B memory=6g,vcpus=4 ml-serve-ctrl1002.eqiad.wmnet - T392289
07:27 elukey: elukey@ganeti2032:~$ sudo gnt-instance modify -B memory=6g,vcpus=4 ml-serve-ctrl2002.codfw.wmnet - T392289
07:26 elukey: elukey@ganeti2032:~$ sudo gnt-instance modify -B memory=6g,vcpus=4 ml-serve-ctrl2001.codfw.wmnet - T392289
07:24 kartik@deploy1003: Finished scap sync-world: Backport for Add channel for ContentTranslation logging (T391311) (duration: 16m 53s)
07:19 moritzm: installing libapache2-mod-auth-openidc security updates
07:17 kartik@deploy1003: abi, kartik: Continuing with sync
07:12 kartik@deploy1003: abi, kartik: Backport for Add channel for ContentTranslation logging (T391311) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:07 kartik@deploy1003: Started scap sync-world: Backport for Add channel for ContentTranslation logging (T391311)
06:31 moritzm: installing erlang security updates

2025-04-22

23:56 TimStarling: running cleanupBlocks.php on all wikis T389301
04:02 mwpresync@deploy1003: Pruned MediaWiki: 1.44.0-wmf.23 (duration: 02m 48s)
00:25 reedy@deploy1003: Synchronized wmf-config/InitialiseSettings-labs.php: Fix syntax (duration: 10m 53s)
00:03 reedy@deploy1003: Synchronized wmf-config/InitialiseSettings-labs.php: Fix syntax (duration: 11m 02s)

2025-04-21

23:12 cstone: civicrm upgraded from d7eefbc4 to b3038510
22:44 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (6 nodes at a time) for ElasticSearch cluster search_codfw: test manual mode - ryankemper@cumin2002 - T388610
22:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2071.codfw.wmnet with OS bullseye
21:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2071.codfw.wmnet with reason: host reimage
21:51 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2071.codfw.wmnet with reason: host reimage
21:38 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (6 nodes at a time) for ElasticSearch cluster search_codfw: test manual mode - ryankemper@cumin2002 - T388610
21:37 ryankemper@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (6 nodes at a time) for ElasticSearch cluster search_codfw: test manual mode - ryankemper@cumin2002 - T388610
21:37 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (6 nodes at a time) for ElasticSearch cluster search_codfw: test manual mode - ryankemper@cumin2002 - T388610
21:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2071
21:36 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2071
21:36 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2071
21:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2071.codfw.wmnet 70.32.192.10.in-addr.arpa 0.7.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
21:35 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2071.codfw.wmnet 70.32.192.10.in-addr.arpa 0.7.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
21:35 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:35 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2071 - bking@cumin2002"
21:35 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2071 - bking@cumin2002"
21:31 bking@cumin2002: START - Cookbook sre.dns.netbox
21:31 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2071
21:31 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2071.codfw.wmnet with OS bullseye
21:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2071 to cirrussearch2071
21:28 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2071
21:28 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2071
21:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:28 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2071 to cirrussearch2071 - bking@cumin2002"
21:27 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2071 to cirrussearch2071 - bking@cumin2002"
21:23 bking@cumin2002: START - Cookbook sre.dns.netbox
21:23 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2071 to cirrussearch2071
21:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2096 to cirrussearch2096
21:14 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2096
21:14 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2096
21:14 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:14 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2096 to cirrussearch2096 - bking@cumin2002"
21:08 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2096 to cirrussearch2096 - bking@cumin2002"
21:03 bking@cumin2002: START - Cookbook sre.dns.netbox
21:03 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2096 to cirrussearch2096
20:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2110.codfw.wmnet with OS bullseye
20:40 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2095.codfw.wmnet with OS bullseye
20:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2110.codfw.wmnet with reason: host reimage
20:36 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2110.codfw.wmnet with reason: host reimage
20:29 jdrewniak@deploy1003: Finished scap sync-world: Backport for Design Research Participant Survey: Deploy (T392325), Enable reading list beta feature for beta cluster (T390881), Create EventStream configuration for PES1.3 Wikirun Game (duration: 18m 44s)
20:22 jdrewniak@deploy1003: jdrewniak, dani, bwang: Continuing with sync
20:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2095.codfw.wmnet with reason: host reimage
20:20 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2110.codfw.wmnet with OS bullseye
20:17 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2095.codfw.wmnet with reason: host reimage
20:16 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2110 to cirrussearch2110
20:15 jdrewniak@deploy1003: jdrewniak, dani, bwang: Backport for Design Research Participant Survey: Deploy (T392325), Enable reading list beta feature for beta cluster (T390881), Create EventStream configuration for PES1.3 Wikirun Game synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:15 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2110
20:15 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2110
20:14 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:14 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2110 to cirrussearch2110 - bking@cumin2002"
20:14 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2110 to cirrussearch2110 - bking@cumin2002"
20:11 jdrewniak@deploy1003: Started scap sync-world: Backport for Design Research Participant Survey: Deploy (T392325), Enable reading list beta feature for beta cluster (T390881), Create EventStream configuration for PES1.3 Wikirun Game
20:10 bking@cumin2002: START - Cookbook sre.dns.netbox
20:09 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2110 to cirrussearch2110
20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2095
20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2095
20:00 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2095
20:00 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2095.codfw.wmnet 232.16.192.10.in-addr.arpa 2.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
19:59 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2095.codfw.wmnet 232.16.192.10.in-addr.arpa 2.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
19:59 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:59 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2095 - bking@cumin2002"
19:59 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2095 - bking@cumin2002"
19:53 bking@cumin2002: START - Cookbook sre.dns.netbox
19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2095
19:53 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2095.codfw.wmnet with OS bullseye
19:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2095 to cirrussearch2095
19:52 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2095
19:51 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2095
19:51 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:51 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2095 to cirrussearch2095 - bking@cumin2002"
19:51 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2095 to cirrussearch2095 - bking@cumin2002"
19:47 bking@cumin2002: START - Cookbook sre.dns.netbox
19:47 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2095 to cirrussearch2095
18:22 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2094.codfw.wmnet with OS bullseye
18:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2050.codfw.wmnet with OS bookworm
18:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2094.codfw.wmnet with reason: host reimage
17:59 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2094.codfw.wmnet with reason: host reimage
17:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:42 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2094
17:42 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2094
17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2094
17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2094.codfw.wmnet 230.16.192.10.in-addr.arpa 0.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2094.codfw.wmnet 230.16.192.10.in-addr.arpa 0.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2094 - bking@cumin2002"
17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2094 - bking@cumin2002"
17:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2050.codfw.wmnet with reason: host reimage
17:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2050.codfw.wmnet with reason: host reimage
17:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2050.codfw.wmnet with OS bookworm
17:24 bking@cumin2002: START - Cookbook sre.dns.netbox
17:24 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2094
17:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2094.codfw.wmnet with OS bullseye
17:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2094 to cirrussearch2094
17:23 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2094
17:23 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2094
17:23 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:22 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2094 to cirrussearch2094 - bking@cumin2002"
17:22 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2094 to cirrussearch2094 - bking@cumin2002"
17:18 bking@cumin2002: START - Cookbook sre.dns.netbox
17:18 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2094 to cirrussearch2094
16:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2064.codfw.wmnet with OS bullseye
16:13 urandom: decommissioning Cassandra/restbase1030-{a,b,c} — T389423
16:12 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2064.codfw.wmnet with reason: host reimage
16:11 eevans@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1030.eqiad.wmnet with reason: Decommissioning — T378725
16:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2064.codfw.wmnet with reason: host reimage
16:03 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch2103.codfw.wmnet|cirrussearch2104.codfw.wmnet|cirrussearch2105.codfw.wmnet|cirrussearch2107.codfw.wmnet|cirrussearch2109.codfw.wmnet|cirrussearch2111.codfw.wmnet|cirrussearch2112.codfw.wmnet|cirrussearch2114.codfw.wmnet|cirrussearch2115.codfw.wmnet
16:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2097\.codfw\.wmnet
16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2091\.codfw\.wmnet
16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2090\.codfw\.wmnet
16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2089\.codfw\.wmnet
16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2088\.codfw\.wmnet
16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2087\.codfw\.wmnet
16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2085\.codfw\.wmnet
16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2082\.codfw\.wmnet
16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2079\.codfw\.wmnet
16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2077\.codfw\.wmnet
16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2075\.codfw\.wmnet
16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2074\.codfw\.wmnet
15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2072\.codfw\.wmnet
15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2070\.codfw\.wmnet
15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2069\.codfw\.wmnet
15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2068\.codfw\.wmnet
15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2067\.codfw\.wmnet
15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2066\.codfw\.wmnet
15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2065\.codfw\.wmnet
15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2063\.codfw\.wmnet
15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2062\.codfw\.wmnet
15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2060\.codfw\.wmnet
15:58 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2059\.codfw\.wmnet
15:58 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2058\.codfw\.wmnet
15:58 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2057\.codfw\.wmnet
15:58 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2056\.codfw\.wmnet
15:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2064
15:53 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2064
15:53 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2064
15:53 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2064.codfw.wmnet 109.16.192.10.in-addr.arpa 9.0.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:53 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2064.codfw.wmnet 109.16.192.10.in-addr.arpa 9.0.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:53 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:53 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2064 - bking@cumin2002"
15:53 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2064 - bking@cumin2002"
15:52 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2057\.codfw\.wmnet
15:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:48 bking@cumin2002: START - Cookbook sre.dns.netbox
15:47 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2064
15:47 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2064.codfw.wmnet with OS bullseye
15:47 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2045.codfw.wmnet with OS bookworm
15:45 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2057
15:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2064 to cirrussearch2064
15:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2064
15:43 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2064
15:43 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:43 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2064 to cirrussearch2064 - bking@cumin2002"
15:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2064 to cirrussearch2064 - bking@cumin2002"
15:42 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2004-dev.codfw.wmnet with OS bullseye
15:38 bking@cumin2002: START - Cookbook sre.dns.netbox
15:37 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2064 to cirrussearch2064
15:23 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2004-dev.codfw.wmnet with reason: host reimage
15:21 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2004-dev.codfw.wmnet with reason: host reimage
15:09 bking@cumin2002: conftool action : set/pooled=yes; selector: name=elastic2078\.codfw\.wmnet
15:03 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd2004-dev.codfw.wmnet with OS bullseye
14:57 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2004-dev.codfw.wmnet with OS bullseye
14:56 bking@cumin2002: conftool action : set/pooled=no; selector: name=elastic2078\.codfw\.wmnet
14:49 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2004-dev.codfw.wmnet with reason: host reimage
14:46 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2004-dev.codfw.wmnet with reason: host reimage
14:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2057.codfw.wmnet with OS bullseye
14:28 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd2004-dev.codfw.wmnet with OS bullseye
14:19 taavi@deploy1003: Finished scap sync-world: Backport for Design Research Participant Survey: Pre-deploy (T392325) (duration: 14m 53s)
14:12 taavi@deploy1003: taavi, dani: Continuing with sync
14:09 taavi@deploy1003: taavi, dani: Backport for Design Research Participant Survey: Pre-deploy (T392325) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:04 taavi@deploy1003: Started scap sync-world: Backport for Design Research Participant Survey: Pre-deploy (T392325)
14:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2057.codfw.wmnet with reason: host reimage
14:00 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2057.codfw.wmnet with reason: host reimage
13:53 taavi: taavi@deploy1003 ~ $ echo "https://en.wikipedia.org/static/images/mobile/copyright/wikimaniawiki-wordmark.svg" | mwscript-k8s --attach purgeList.php -- --wiki enwiki
13:49 taavi@deploy1003: Finished scap sync-world: Backport for wikimaniawiki: update logo to 2025 (T392239), Enable mobile sitenotice for shwiki (T392334) (duration: 41m 04s)
13:44 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2057
13:44 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2057
13:43 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2057
13:43 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2057.codfw.wmnet 204.16.192.10.in-addr.arpa 4.0.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
13:43 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2057.codfw.wmnet 204.16.192.10.in-addr.arpa 4.0.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
13:43 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:43 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2057 - bking@cumin2002"
13:43 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2057 - bking@cumin2002"
13:40 taavi@deploy1003: robertsky, taavi, aleksandar: Continuing with sync
13:39 bking@cumin2002: START - Cookbook sre.dns.netbox
13:38 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2057
13:38 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2057.codfw.wmnet with OS bullseye
13:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2057 to cirrussearch2057
13:37 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2057
13:36 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2057
13:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:36 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2057 to cirrussearch2057 - bking@cumin2002"
13:36 taavi@deploy1003: robertsky, taavi, aleksandar: Backport for wikimaniawiki: update logo to 2025 (T392239), Enable mobile sitenotice for shwiki (T392334) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:36 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2057 to cirrussearch2057 - bking@cumin2002"
13:30 bking@cumin2002: START - Cookbook sre.dns.netbox
13:29 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2057 to cirrussearch2057
13:08 taavi@deploy1003: Started scap sync-world: Backport for wikimaniawiki: update logo to 2025 (T392239), Enable mobile sitenotice for shwiki (T392334)

2025-04-19

16:48 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-ctrl1002.eqiad.wmnet
16:44 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl1002.eqiad.wmnet
16:44 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-ctrl1003.eqiad.wmnet
16:40 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl1003.eqiad.wmnet
16:40 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-ctrl2003.codfw.wmnet
16:38 elukey: `sudo gnt-instance modify -B memory=6g,vcpus=4 aux-k8s-ctrl1002.eqiad.wmnet` - T392289
16:38 elukey: `sudo gnt-instance modify -B memory=6g,vcpus=4 aux-k8s-ctrl1003.eqiad.wmnet` - T392289
16:36 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl2003.codfw.wmnet
16:35 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-ctrl2002.codfw.wmnet
16:32 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl2002.codfw.wmnet
16:30 elukey: `sudo gnt-instance modify -B memory=6g,vcpus=4 aux-k8s-ctrl2002.codfw.wmnet` - T392289
16:30 elukey: `sudo gnt-instance modify -B memory=6g,vcpus=4 aux-k8s-ctrl2003.codfw.wmnet` - T392289
00:26 urandom: decommissioning Cassandra/restbase1029-{a,b,c} — T389423
00:24 eevans@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1029.eqiad.wmnet with reason: Decommissioning — T389423
00:13 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
00:09 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED

2025-04-18

23:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
23:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
23:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
23:30 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:56 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:54 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:53 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1179
21:52 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1179
21:51 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:48 vriley@cumin1002: START - Cookbook sre.dns.netbox
21:11 brett@dns1005: END - running authdns-update
21:09 brett@dns1005: START - running authdns-update
15:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2045.codfw.wmnet with OS bookworm
14:53 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs1015.eqiad.wmnet
14:45 eevans@cumin1002: START - Cookbook sre.hosts.reboot-single for host aqs1015.eqiad.wmnet
14:45 urandom: rebooting aqs1015.eqiad.wmnet (drive detection/ordering) — T391903
14:40 _joe_: enabled slow query log on db1218, investigating T390510
11:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 200132
11:17 cmooney@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 200132
10:08 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
09:57 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ms-fe1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
09:56 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
09:53 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ms-fe1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
09:52 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe1016
09:52 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe1016
09:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
09:49 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ms-fe1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
09:48 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:48 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ms-fe1016] - vriley@cumin1002"
09:47 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ms-fe1016] - vriley@cumin1002"
09:43 vriley@cumin1002: START - Cookbook sre.dns.netbox
09:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-fe1015']
09:39 vriley@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1015']
09:38 vriley@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-fe1015']
09:38 vriley@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1015']
09:02 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ms-fe1015.eqiad.wmnet with OS bullseye
09:00 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
08:45 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1182.eqiad.wmnet with OS bullseye
08:45 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
08:40 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
08:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ms-fe1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
08:37 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe1015
08:37 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe1015
08:36 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:36 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ms-fe1015] - vriley@cumin1002"
08:36 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ms-fe1015] - vriley@cumin1002"
08:30 vriley@cumin1002: START - Cookbook sre.dns.netbox
08:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
08:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
08:18 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1182.eqiad.wmnet with reason: host reimage
08:14 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1182.eqiad.wmnet with reason: host reimage
07:58 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1182.eqiad.wmnet with OS bullseye
07:57 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1182.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
07:56 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1182.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
07:52 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
07:45 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
07:22 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1184.eqiad.wmnet with OS bullseye
07:22 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
06:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2239.codfw.wmnet with reason: Maintenance
06:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T391056)', diff saved to https://phabricator.wikimedia.org/P75283 and previous config saved to /var/cache/conftool/dbconfig/20250418-062830-fceratto.json
06:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P75282 and previous config saved to /var/cache/conftool/dbconfig/20250418-061324-fceratto.json
05:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P75281 and previous config saved to /var/cache/conftool/dbconfig/20250418-055816-fceratto.json
05:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T391056)', diff saved to https://phabricator.wikimedia.org/P75280 and previous config saved to /var/cache/conftool/dbconfig/20250418-054309-fceratto.json
05:37 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2237 (T391056)', diff saved to https://phabricator.wikimedia.org/P75279 and previous config saved to /var/cache/conftool/dbconfig/20250418-053713-fceratto.json
05:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2237.codfw.wmnet with reason: Maintenance
05:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T391056)', diff saved to https://phabricator.wikimedia.org/P75278 and previous config saved to /var/cache/conftool/dbconfig/20250418-053648-fceratto.json
05:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P75277 and previous config saved to /var/cache/conftool/dbconfig/20250418-052141-fceratto.json
05:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P75276 and previous config saved to /var/cache/conftool/dbconfig/20250418-050635-fceratto.json
04:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T391056)', diff saved to https://phabricator.wikimedia.org/P75275 and previous config saved to /var/cache/conftool/dbconfig/20250418-045127-fceratto.json
04:45 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2236 (T391056)', diff saved to https://phabricator.wikimedia.org/P75274 and previous config saved to /var/cache/conftool/dbconfig/20250418-044545-fceratto.json
04:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2236.codfw.wmnet with reason: Maintenance
04:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T391056)', diff saved to https://phabricator.wikimedia.org/P75273 and previous config saved to /var/cache/conftool/dbconfig/20250418-044523-fceratto.json
04:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P75272 and previous config saved to /var/cache/conftool/dbconfig/20250418-043015-fceratto.json
04:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P75271 and previous config saved to /var/cache/conftool/dbconfig/20250418-041508-fceratto.json
04:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T391056)', diff saved to https://phabricator.wikimedia.org/P75270 and previous config saved to /var/cache/conftool/dbconfig/20250418-040001-fceratto.json
03:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T391056)', diff saved to https://phabricator.wikimedia.org/P75269 and previous config saved to /var/cache/conftool/dbconfig/20250418-035406-fceratto.json
03:53 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2219.codfw.wmnet with reason: Maintenance
03:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T391056)', diff saved to https://phabricator.wikimedia.org/P75268 and previous config saved to /var/cache/conftool/dbconfig/20250418-035342-fceratto.json
03:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P75267 and previous config saved to /var/cache/conftool/dbconfig/20250418-033834-fceratto.json
03:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P75266 and previous config saved to /var/cache/conftool/dbconfig/20250418-032327-fceratto.json
03:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T391056)', diff saved to https://phabricator.wikimedia.org/P75265 and previous config saved to /var/cache/conftool/dbconfig/20250418-030820-fceratto.json
03:02 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T391056)', diff saved to https://phabricator.wikimedia.org/P75264 and previous config saved to /var/cache/conftool/dbconfig/20250418-030239-fceratto.json
03:02 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2210.codfw.wmnet with reason: Maintenance
03:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T391056)', diff saved to https://phabricator.wikimedia.org/P75263 and previous config saved to /var/cache/conftool/dbconfig/20250418-030216-fceratto.json
02:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P75262 and previous config saved to /var/cache/conftool/dbconfig/20250418-024709-fceratto.json
02:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P75261 and previous config saved to /var/cache/conftool/dbconfig/20250418-023202-fceratto.json
02:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T391056)', diff saved to https://phabricator.wikimedia.org/P75260 and previous config saved to /var/cache/conftool/dbconfig/20250418-021655-fceratto.json
02:11 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T391056)', diff saved to https://phabricator.wikimedia.org/P75259 and previous config saved to /var/cache/conftool/dbconfig/20250418-021122-fceratto.json
02:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2206.codfw.wmnet with reason: Maintenance
02:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2199.codfw.wmnet with reason: Maintenance
02:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T391056)', diff saved to https://phabricator.wikimedia.org/P75258 and previous config saved to /var/cache/conftool/dbconfig/20250418-020728-fceratto.json
01:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P75257 and previous config saved to /var/cache/conftool/dbconfig/20250418-015221-fceratto.json
01:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P75256 and previous config saved to /var/cache/conftool/dbconfig/20250418-013714-fceratto.json
01:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T391056)', diff saved to https://phabricator.wikimedia.org/P75255 and previous config saved to /var/cache/conftool/dbconfig/20250418-012207-fceratto.json
01:16 wfan: civicrm upgraded from 38a7a649 to d7eefbc4
01:16 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2179 (T391056)', diff saved to https://phabricator.wikimedia.org/P75254 and previous config saved to /var/cache/conftool/dbconfig/20250418-011558-fceratto.json
01:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2179.codfw.wmnet with reason: Maintenance
01:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T391056)', diff saved to https://phabricator.wikimedia.org/P75253 and previous config saved to /var/cache/conftool/dbconfig/20250418-011536-fceratto.json
01:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P75252 and previous config saved to /var/cache/conftool/dbconfig/20250418-010030-fceratto.json
00:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P75251 and previous config saved to /var/cache/conftool/dbconfig/20250418-004524-fceratto.json
00:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T391056)', diff saved to https://phabricator.wikimedia.org/P75250 and previous config saved to /var/cache/conftool/dbconfig/20250418-003016-fceratto.json
00:24 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T391056)', diff saved to https://phabricator.wikimedia.org/P75249 and previous config saved to /var/cache/conftool/dbconfig/20250418-002408-fceratto.json
00:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2172.codfw.wmnet with reason: Maintenance
00:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T391056)', diff saved to https://phabricator.wikimedia.org/P75248 and previous config saved to /var/cache/conftool/dbconfig/20250418-002344-fceratto.json
00:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P75247 and previous config saved to /var/cache/conftool/dbconfig/20250418-000838-fceratto.json

2025-04-17

23:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P75246 and previous config saved to /var/cache/conftool/dbconfig/20250417-235331-fceratto.json
23:40 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2078.codfw.wmnet with OS bullseye
23:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T391056)', diff saved to https://phabricator.wikimedia.org/P75245 and previous config saved to /var/cache/conftool/dbconfig/20250417-233825-fceratto.json
23:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T391056)', diff saved to https://phabricator.wikimedia.org/P75244 and previous config saved to /var/cache/conftool/dbconfig/20250417-233211-fceratto.json
23:32 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2187.codfw.wmnet with reason: Maintenance
23:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2155.codfw.wmnet with reason: Maintenance
23:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T391056)', diff saved to https://phabricator.wikimedia.org/P75243 and previous config saved to /var/cache/conftool/dbconfig/20250417-233131-fceratto.json
23:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P75242 and previous config saved to /var/cache/conftool/dbconfig/20250417-231625-fceratto.json
23:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P75241 and previous config saved to /var/cache/conftool/dbconfig/20250417-230118-fceratto.json
22:58 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
22:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2097.codfw.wmnet with OS bullseye
22:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T391056)', diff saved to https://phabricator.wikimedia.org/P75240 and previous config saved to /var/cache/conftool/dbconfig/20250417-224611-fceratto.json
22:39 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T391056)', diff saved to https://phabricator.wikimedia.org/P75239 and previous config saved to /var/cache/conftool/dbconfig/20250417-223957-fceratto.json
22:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2147.codfw.wmnet with reason: Maintenance
22:35 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1252.eqiad.wmnet with reason: Maintenance
22:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T391056)', diff saved to https://phabricator.wikimedia.org/P75238 and previous config saved to /var/cache/conftool/dbconfig/20250417-223130-fceratto.json
22:30 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
22:22 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2097.codfw.wmnet with reason: host reimage
22:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
22:20 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
22:20 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
22:19 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
22:18 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2097.codfw.wmnet with reason: host reimage
22:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P75237 and previous config saved to /var/cache/conftool/dbconfig/20250417-221623-fceratto.json
22:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
22:15 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
22:15 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
22:14 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1184.eqiad.wmnet with reason: host reimage
22:13 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cirrussearch2078.codfw.wmnet']
22:10 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1184.eqiad.wmnet with reason: host reimage
22:04 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2078.codfw.wmnet']
22:04 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cirrussearch2078.codfw.wmnet']
22:03 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2078.codfw.wmnet']
22:02 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
22:01 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2097
22:01 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2097
22:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P75236 and previous config saved to /var/cache/conftool/dbconfig/20250417-220116-fceratto.json
22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2097
22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2097.codfw.wmnet 234.16.192.10.in-addr.arpa 4.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2097.codfw.wmnet 234.16.192.10.in-addr.arpa 4.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2097 - bking@cumin2002"
22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2097 - bking@cumin2002"
21:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
21:58 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
21:58 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
21:57 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
21:54 bking@cumin2002: START - Cookbook sre.dns.netbox
21:52 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1184.eqiad.wmnet with OS bullseye
21:50 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2097
21:50 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2097.codfw.wmnet with OS bullseye
21:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T391056)', diff saved to https://phabricator.wikimedia.org/P75235 and previous config saved to /var/cache/conftool/dbconfig/20250417-214610-fceratto.json
21:46 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
21:46 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
21:46 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
21:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2097.codfw.wmnet on all recursors
21:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2097.codfw.wmnet on all recursors
21:42 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2097 to cirrussearch2097
21:42 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2097
21:41 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2097
21:41 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:41 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2097 to cirrussearch2097 - bking@cumin2002"
21:40 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2097 to cirrussearch2097 - bking@cumin2002"
21:37 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1183.eqiad.wmnet with OS bullseye
21:35 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:20 bking@cumin2002: START - Cookbook sre.dns.netbox
21:19 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2097 to cirrussearch2097
21:19 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2078.codfw.wmnet with OS bullseye
21:19 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.move-vlan (exit_code=99) for host cirrussearch2078
21:19 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
21:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
21:18 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2078.codfw.wmnet on all recursors
21:18 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2078.codfw.wmnet on all recursors
21:18 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2078 to cirrussearch2078
21:17 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2078
21:17 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2078
21:17 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:17 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2078 to cirrussearch2078 - bking@cumin2002"
21:17 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2078 to cirrussearch2078 - bking@cumin2002"
21:16 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:15 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1183.eqiad.wmnet with reason: host reimage
21:13 bking@cumin2002: START - Cookbook sre.dns.netbox
21:12 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2078 to cirrussearch2078
21:11 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1183.eqiad.wmnet with reason: host reimage
21:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2058.codfw.wmnet with OS bullseye
20:56 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1183.eqiad.wmnet with OS bullseye
20:55 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1183.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
20:50 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1178.eqiad.wmnet with OS bullseye
20:45 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T391056)', diff saved to https://phabricator.wikimedia.org/P75234 and previous config saved to /var/cache/conftool/dbconfig/20250417-204552-fceratto.json
20:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1249.eqiad.wmnet with reason: Maintenance
20:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T391056)', diff saved to https://phabricator.wikimedia.org/P75233 and previous config saved to /var/cache/conftool/dbconfig/20250417-204528-fceratto.json
20:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2058.codfw.wmnet with reason: host reimage
20:37 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1183.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
20:34 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2058.codfw.wmnet with reason: host reimage
20:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P75232 and previous config saved to /var/cache/conftool/dbconfig/20250417-203021-fceratto.json
20:25 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1181.eqiad.wmnet with OS bullseye
20:25 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1178.eqiad.wmnet with reason: host reimage
20:25 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1178.eqiad.wmnet with reason: host reimage
20:18 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2058
20:18 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2058
20:18 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2058
20:18 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2058.codfw.wmnet 205.16.192.10.in-addr.arpa 5.0.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
20:18 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2058.codfw.wmnet 205.16.192.10.in-addr.arpa 5.0.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
20:18 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:18 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2058 - bking@cumin2002"
20:18 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2058 - bking@cumin2002"
20:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P75231 and previous config saved to /var/cache/conftool/dbconfig/20250417-201515-fceratto.json
20:13 bking@cumin2002: START - Cookbook sre.dns.netbox
20:10 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1178.eqiad.wmnet with OS bullseye
20:09 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1178.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
20:09 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2058
20:08 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2058.codfw.wmnet with OS bullseye
20:07 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2058.codfw.wmnet on all recursors
20:07 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2058.codfw.wmnet on all recursors
20:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2058 to cirrussearch2058
20:06 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2058
20:06 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2058
20:06 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:06 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2058 to cirrussearch2058 - bking@cumin2002"
20:05 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2058 to cirrussearch2058 - bking@cumin2002"
20:02 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1181.eqiad.wmnet with reason: host reimage
20:00 bking@cumin2002: START - Cookbook sre.dns.netbox
20:00 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2058 to cirrussearch2058
20:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T391056)', diff saved to https://phabricator.wikimedia.org/P75230 and previous config saved to /var/cache/conftool/dbconfig/20250417-200008-fceratto.json
19:59 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
19:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
19:58 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
19:58 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1181.eqiad.wmnet with reason: host reimage
19:55 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T391056)', diff saved to https://phabricator.wikimedia.org/P75229 and previous config saved to /var/cache/conftool/dbconfig/20250417-195506-fceratto.json
19:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1248.eqiad.wmnet with reason: Maintenance
19:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T391056)', diff saved to https://phabricator.wikimedia.org/P75228 and previous config saved to /var/cache/conftool/dbconfig/20250417-195442-fceratto.json
19:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1178.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
19:50 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1178.eqiad.wmnet with OS bullseye
19:44 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1181.eqiad.wmnet with OS bullseye
19:43 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
19:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
19:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P75226 and previous config saved to /var/cache/conftool/dbconfig/20250417-193935-fceratto.json
19:36 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1178.eqiad.wmnet with OS bullseye
19:35 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1178.eqiad.wmnet with OS bullseye
19:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P75225 and previous config saved to /var/cache/conftool/dbconfig/20250417-192430-fceratto.json
19:22 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1178.eqiad.wmnet with OS bullseye
19:21 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1178.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
19:14 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1178.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
19:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T391056)', diff saved to https://phabricator.wikimedia.org/P75223 and previous config saved to /var/cache/conftool/dbconfig/20250417-190923-fceratto.json
19:03 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T391056)', diff saved to https://phabricator.wikimedia.org/P75222 and previous config saved to /var/cache/conftool/dbconfig/20250417-190331-fceratto.json
19:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1247.eqiad.wmnet with reason: Maintenance
18:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1245.eqiad.wmnet with reason: Maintenance
18:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T391056)', diff saved to https://phabricator.wikimedia.org/P75221 and previous config saved to /var/cache/conftool/dbconfig/20250417-185930-fceratto.json
18:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P75219 and previous config saved to /var/cache/conftool/dbconfig/20250417-184423-fceratto.json
18:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P75218 and previous config saved to /var/cache/conftool/dbconfig/20250417-182916-fceratto.json
18:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T391056)', diff saved to https://phabricator.wikimedia.org/P75217 and previous config saved to /var/cache/conftool/dbconfig/20250417-181408-fceratto.json
18:13 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.25 refs T386220
17:56 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T391056)', diff saved to https://phabricator.wikimedia.org/P75216 and previous config saved to /var/cache/conftool/dbconfig/20250417-175614-fceratto.json
17:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1243.eqiad.wmnet with reason: Maintenance
17:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T391056)', diff saved to https://phabricator.wikimedia.org/P75215 and previous config saved to /var/cache/conftool/dbconfig/20250417-175552-fceratto.json
17:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P75214 and previous config saved to /var/cache/conftool/dbconfig/20250417-174046-fceratto.json
17:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P75213 and previous config saved to /var/cache/conftool/dbconfig/20250417-172539-fceratto.json
17:20 mutante: idp-test2005 - 100% disk space used - alerting since over 6 days (is there a point in alerts for test hosts?) - apt-get clean .. brought it back to 94%
17:12 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
17:11 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
17:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T391056)', diff saved to https://phabricator.wikimedia.org/P75212 and previous config saved to /var/cache/conftool/dbconfig/20250417-171032-fceratto.json
17:09 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
17:09 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
17:09 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:08 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
17:04 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T391056)', diff saved to https://phabricator.wikimedia.org/P75211 and previous config saved to /var/cache/conftool/dbconfig/20250417-170438-fceratto.json
17:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1242.eqiad.wmnet with reason: Maintenance
17:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T391056)', diff saved to https://phabricator.wikimedia.org/P75210 and previous config saved to /var/cache/conftool/dbconfig/20250417-170416-fceratto.json
17:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2066.codfw.wmnet with OS bullseye
16:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P75209 and previous config saved to /var/cache/conftool/dbconfig/20250417-164909-fceratto.json
16:35 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2066.codfw.wmnet with reason: host reimage
16:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P75208 and previous config saved to /var/cache/conftool/dbconfig/20250417-163403-fceratto.json
16:32 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2066.codfw.wmnet with reason: host reimage
16:30 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
16:30 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
16:30 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
16:30 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
16:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T391056)', diff saved to https://phabricator.wikimedia.org/P75207 and previous config saved to /var/cache/conftool/dbconfig/20250417-161854-fceratto.json
16:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2066
16:17 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2066
16:15 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2066
16:15 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2066.codfw.wmnet 69.32.192.10.in-addr.arpa 9.6.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:15 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2066.codfw.wmnet 69.32.192.10.in-addr.arpa 9.6.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:15 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:15 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2066 - bking@cumin2002"
16:15 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2066 - bking@cumin2002"
16:13 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T391056)', diff saved to https://phabricator.wikimedia.org/P75206 and previous config saved to /var/cache/conftool/dbconfig/20250417-161307-fceratto.json
16:13 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1241.eqiad.wmnet with reason: Maintenance
16:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T391056)', diff saved to https://phabricator.wikimedia.org/P75205 and previous config saved to /var/cache/conftool/dbconfig/20250417-161245-fceratto.json
16:11 bking@cumin2002: START - Cookbook sre.dns.netbox
16:11 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2066
16:10 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2066.codfw.wmnet with OS bullseye
16:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2066 to cirrussearch2066
16:09 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2066
16:09 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2066
16:09 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2066 to cirrussearch2066 - bking@cumin2002"
16:07 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2066 to cirrussearch2066 - bking@cumin2002"
15:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P75204 and previous config saved to /var/cache/conftool/dbconfig/20250417-155738-fceratto.json
15:54 bking@cumin2002: START - Cookbook sre.dns.netbox
15:53 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2066 to cirrussearch2066
15:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P75201 and previous config saved to /var/cache/conftool/dbconfig/20250417-154231-fceratto.json
15:34 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwdebug2001.codfw.wmnet with OS bullseye
15:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T391056)', diff saved to https://phabricator.wikimedia.org/P75200 and previous config saved to /var/cache/conftool/dbconfig/20250417-152724-fceratto.json
15:13 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T391056)', diff saved to https://phabricator.wikimedia.org/P75199 and previous config saved to /var/cache/conftool/dbconfig/20250417-151330-fceratto.json
15:13 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1238.eqiad.wmnet with reason: Maintenance
15:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T391056)', diff saved to https://phabricator.wikimedia.org/P75198 and previous config saved to /var/cache/conftool/dbconfig/20250417-151308-fceratto.json
14:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P75197 and previous config saved to /var/cache/conftool/dbconfig/20250417-145801-fceratto.json
14:57 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwdebug2001.codfw.wmnet with reason: host reimage
14:55 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage cirrussearch hosts - bking@cumin2002 - T388610
14:53 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mwdebug2001.codfw.wmnet with reason: host reimage
14:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P75196 and previous config saved to /var/cache/conftool/dbconfig/20250417-144254-fceratto.json
14:36 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mwdebug2001.codfw.wmnet with OS bullseye
14:32 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
14:31 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
14:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T391056)', diff saved to https://phabricator.wikimedia.org/P75195 and previous config saved to /var/cache/conftool/dbconfig/20250417-142746-fceratto.json
14:22 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T391056)', diff saved to https://phabricator.wikimedia.org/P75194 and previous config saved to /var/cache/conftool/dbconfig/20250417-142221-fceratto.json
14:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
14:21 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1221.eqiad.wmnet with reason: Maintenance
14:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T391056)', diff saved to https://phabricator.wikimedia.org/P75193 and previous config saved to /var/cache/conftool/dbconfig/20250417-142139-fceratto.json
14:11 hashar: Restarting Gerrit to apply replication configuration change
14:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P75191 and previous config saved to /var/cache/conftool/dbconfig/20250417-140632-fceratto.json
14:02 jiji@cumin1002: conftool action : set/pooled=inactive; selector: name=mwdebug2001.codfw.wmnet
14:02 jiji@cumin1002: conftool action : set/pooled=inactive; selector: name=mwdebug1002.codfw.wmnet
14:02 jiji@cumin1002: conftool action : set/pooled=yes; selector: name=mwdebug2002.codfw.wmnet
14:01 jiji@cumin1002: conftool action : set/pooled=inactive; selector: name=mwdebug2002.codfw.wmnet
13:57 dcausse: closing the UTC afternoon backport window
13:54 dcausse@deploy1003: Finished scap sync-world: Backport for wikimaniawiki: add extendedconfirmed to translationadmin (T389729) (duration: 13m 25s)
13:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P75190 and previous config saved to /var/cache/conftool/dbconfig/20250417-135125-fceratto.json
13:49 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage cirrussearch hosts - bking@cumin2002 - T388610
13:48 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage cirrussearch hosts - bking@cumin2002 - T388610
13:47 dcausse@deploy1003: dcausse, robertsky: Continuing with sync
13:46 dcausse@deploy1003: dcausse, robertsky: Backport for wikimaniawiki: add extendedconfirmed to translationadmin (T389729) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:41 dcausse@deploy1003: Started scap sync-world: Backport for wikimaniawiki: add extendedconfirmed to translationadmin (T389729)
13:38 dcausse@deploy1003: Finished scap sync-world: Backport for Gracefully handle BadRevisionException (T382904), Gracefully handle BadRevisionException (T382904) (duration: 12m 23s)
13:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T391056)', diff saved to https://phabricator.wikimedia.org/P75189 and previous config saved to /var/cache/conftool/dbconfig/20250417-133618-fceratto.json
13:31 dcausse@deploy1003: dcausse: Continuing with sync
13:30 dcausse@deploy1003: dcausse: Backport for Gracefully handle BadRevisionException (T382904), Gracefully handle BadRevisionException (T382904) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:30 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T391056)', diff saved to https://phabricator.wikimedia.org/P75188 and previous config saved to /var/cache/conftool/dbconfig/20250417-133004-fceratto.json
13:29 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1199.eqiad.wmnet with reason: Maintenance
13:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T391056)', diff saved to https://phabricator.wikimedia.org/P75187 and previous config saved to /var/cache/conftool/dbconfig/20250417-132942-fceratto.json
13:28 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage cirrussearch hosts - bking@cumin2002 - T388610
13:25 dcausse@deploy1003: Started scap sync-world: Backport for Gracefully handle BadRevisionException (T382904), Gracefully handle BadRevisionException (T382904)
13:17 dreamyjazz@deploy1003: Finished scap sync-world: Backport for frwiki: Add abusefilter-access-protected-vars to EFM, remove it from sysops. (T381722) (duration: 13m 18s)
13:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P75186 and previous config saved to /var/cache/conftool/dbconfig/20250417-131435-fceratto.json
13:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1180.eqiad.wmnet with OS bullseye
13:10 dreamyjazz@deploy1003: dreamyjazz, wpld: Continuing with sync
13:09 dreamyjazz@deploy1003: dreamyjazz, wpld: Backport for frwiki: Add abusefilter-access-protected-vars to EFM, remove it from sysops. (T381722) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:03 dreamyjazz@deploy1003: Started scap sync-world: Backport for frwiki: Add abusefilter-access-protected-vars to EFM, remove it from sysops. (T381722)
12:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P75185 and previous config saved to /var/cache/conftool/dbconfig/20250417-125928-fceratto.json
12:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be1065.eqiad.wmnet
12:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.remove-downtime for ms-be1065.eqiad.wmnet
12:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1180.eqiad.wmnet with reason: host reimage
12:46 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:46 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1180.eqiad.wmnet with reason: host reimage
12:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T391056)', diff saved to https://phabricator.wikimedia.org/P75184 and previous config saved to /var/cache/conftool/dbconfig/20250417-124421-fceratto.json
12:43 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:42 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
12:38 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T391056)', diff saved to https://phabricator.wikimedia.org/P75183 and previous config saved to /var/cache/conftool/dbconfig/20250417-123804-fceratto.json
12:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1190.eqiad.wmnet with reason: Maintenance
12:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T391056)', diff saved to https://phabricator.wikimedia.org/P75182 and previous config saved to /var/cache/conftool/dbconfig/20250417-123742-fceratto.json
12:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1180.eqiad.wmnet with OS bullseye
12:25 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1065.eqiad.wmnet with reason: vacuum overlarge container dbs
12:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be1066.eqiad.wmnet
12:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.remove-downtime for ms-be1066.eqiad.wmnet
12:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P75181 and previous config saved to /var/cache/conftool/dbconfig/20250417-122235-fceratto.json
12:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P75180 and previous config saved to /var/cache/conftool/dbconfig/20250417-120728-fceratto.json
11:57 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be1066.eqiad.wmnet with reason: vacuum overlarge container dbs
11:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T391056)', diff saved to https://phabricator.wikimedia.org/P75179 and previous config saved to /var/cache/conftool/dbconfig/20250417-115221-fceratto.json
11:45 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1160 (T391056)', diff saved to https://phabricator.wikimedia.org/P75178 and previous config saved to /var/cache/conftool/dbconfig/20250417-114551-fceratto.json
11:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: Maintenance
11:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance
11:34 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be1066.eqiad.wmnet with reason: vacuum overlarge container dbs
09:14 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@1e9e1f9]: bump image suggestions to 1.5.0 (duration: 01m 54s)
09:13 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@1e9e1f9]: bump image suggestions to 1.5.0
08:58 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1091.eqiad.wmnet with OS bullseye
08:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1091.eqiad.wmnet with reason: host reimage
08:50 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve-ctrl1002.eqiad.wmnet with OS bookworm
08:49 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1091.eqiad.wmnet with reason: host reimage
08:34 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1091.eqiad.wmnet with OS bullseye
08:34 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be1091.eqiad.wmnet with OS bullseye
08:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: host reimage
08:27 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: host reimage
08:20 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1091.eqiad.wmnet with OS bullseye
08:12 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve-ctrl1002.eqiad.wmnet with OS bookworm
07:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
07:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
07:41 brouberol@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
05:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.gerrit.failover (exit_code=0) from gerrit2002.wikimedia.org to gerrit2003.wikimedia.org
05:28 arnaudb@cumin1002: START - Cookbook sre.gerrit.failover from gerrit2002.wikimedia.org to gerrit2003.wikimedia.org
05:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.gerrit.failover (exit_code=0) from gerrit2002.wikimedia.org to gerrit2003.wikimedia.org
05:27 arnaudb@cumin1002: START - Cookbook sre.gerrit.failover from gerrit2002.wikimedia.org to gerrit2003.wikimedia.org
05:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.gerrit.failover (exit_code=0) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
05:03 arnaudb@cumin1002: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
03:54 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2091.codfw.wmnet with OS bullseye
03:54 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
02:48 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
02:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2091
02:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2091
02:34 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
02:22 TheresNoTime: [samtar@mwmaint1002 ~]$ mwscript maintenance/cleanupTitles.php --wiki=shwiktionary # `Razgovor:Vikirečnik:Srpskohrvatski`
00:55 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2091.codfw.wmnet with OS bullseye
00:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T391056)', diff saved to https://phabricator.wikimedia.org/P75177 and previous config saved to /var/cache/conftool/dbconfig/20250417-002743-fceratto.json
00:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P75176 and previous config saved to /var/cache/conftool/dbconfig/20250417-001235-fceratto.json

2025-04-16

23:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P75175 and previous config saved to /var/cache/conftool/dbconfig/20250416-235728-fceratto.json
23:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T391056)', diff saved to https://phabricator.wikimedia.org/P75174 and previous config saved to /var/cache/conftool/dbconfig/20250416-234221-fceratto.json
23:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2091
23:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2091
23:34 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
23:33 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2091.codfw.wmnet with OS bullseye
23:33 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2091
23:33 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2091
23:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
23:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2238 (T391056)', diff saved to https://phabricator.wikimedia.org/P75173 and previous config saved to /var/cache/conftool/dbconfig/20250416-233200-fceratto.json
23:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2238.codfw.wmnet with reason: Maintenance
23:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T391056)', diff saved to https://phabricator.wikimedia.org/P75172 and previous config saved to /var/cache/conftool/dbconfig/20250416-233148-fceratto.json
23:28 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2091.codfw.wmnet with OS bullseye
23:16 urandom: decommissioning restbase1028/Cassandra — T389423
23:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P75171 and previous config saved to /var/cache/conftool/dbconfig/20250416-231641-fceratto.json
23:16 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2091
23:15 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2091
23:15 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
23:15 eevans@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1028.eqiad.wmnet with reason: Decommissioning — T389423
23:14 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2091.codfw.wmnet with OS bullseye
23:11 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1045.eqiad.wmnet
23:11 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase1045.eqiad.wmnet
23:11 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1044.eqiad.wmnet
23:11 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase1044.eqiad.wmnet
23:10 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1043.eqiad.wmnet
23:10 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase1043.eqiad.wmnet
23:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P75170 and previous config saved to /var/cache/conftool/dbconfig/20250416-230134-fceratto.json
22:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2091
22:54 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2091
22:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
22:49 aude@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
22:49 aude@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
22:46 aude@deploy1003: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
22:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T391056)', diff saved to https://phabricator.wikimedia.org/P75169 and previous config saved to /var/cache/conftool/dbconfig/20250416-224627-fceratto.json
22:46 aude@deploy1003: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
22:44 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2226 (T391056)', diff saved to https://phabricator.wikimedia.org/P75168 and previous config saved to /var/cache/conftool/dbconfig/20250416-224405-fceratto.json
22:43 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2187.codfw.wmnet with reason: Maintenance
22:43 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2226.codfw.wmnet with reason: Maintenance
22:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T391056)', diff saved to https://phabricator.wikimedia.org/P75167 and previous config saved to /var/cache/conftool/dbconfig/20250416-224325-fceratto.json
22:36 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
22:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P75166 and previous config saved to /var/cache/conftool/dbconfig/20250416-222818-fceratto.json
22:26 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2079.codfw.wmnet with OS bullseye
22:21 aude@deploy1003: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
22:20 aude@deploy1003: helmfile [staging] START helmfile.d/services/chart-renderer: apply
22:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P75165 and previous config saved to /var/cache/conftool/dbconfig/20250416-221311-fceratto.json
22:01 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2079.codfw.wmnet with reason: host reimage
21:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T391056)', diff saved to https://phabricator.wikimedia.org/P75164 and previous config saved to /var/cache/conftool/dbconfig/20250416-215804-fceratto.json
21:56 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2079.codfw.wmnet with reason: host reimage
21:55 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch2.*
21:47 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2225 (T391056)', diff saved to https://phabricator.wikimedia.org/P75163 and previous config saved to /var/cache/conftool/dbconfig/20250416-214710-fceratto.json
21:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2225.codfw.wmnet with reason: Maintenance
21:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T391056)', diff saved to https://phabricator.wikimedia.org/P75162 and previous config saved to /var/cache/conftool/dbconfig/20250416-214648-fceratto.json
21:41 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2079
21:41 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2079
21:41 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2079
21:41 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2079.codfw.wmnet 128.16.192.10.in-addr.arpa 8.2.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
21:41 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2079.codfw.wmnet 128.16.192.10.in-addr.arpa 8.2.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
21:41 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:41 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2079 - bking@cumin2002"
21:41 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2079 - bking@cumin2002"
21:33 reedy@deploy1003: Finished scap sync-world: Backport for specials: Fix PHP Warning on Special:PasswordReset for crafted input (T392086) (duration: 11m 47s)
21:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P75161 and previous config saved to /var/cache/conftool/dbconfig/20250416-213141-fceratto.json
21:30 bking@cumin2002: START - Cookbook sre.dns.netbox
21:28 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2079
21:27 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2079.codfw.wmnet with OS bullseye
21:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2079.codfw.wmnet on all recursors
21:27 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2079.codfw.wmnet on all recursors
21:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2079 to cirrussearch2079
21:26 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2079
21:26 reedy@deploy1003: reedy: Continuing with sync
21:26 reedy@deploy1003: reedy: Backport for specials: Fix PHP Warning on Special:PasswordReset for crafted input (T392086) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:25 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2079
21:25 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:25 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2079 to cirrussearch2079 - bking@cumin2002"
21:21 reedy@deploy1003: Started scap sync-world: Backport for specials: Fix PHP Warning on Special:PasswordReset for crafted input (T392086)
21:18 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2079 to cirrussearch2079 - bking@cumin2002"
21:17 ecarg@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
21:16 ecarg@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
21:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P75160 and previous config saved to /var/cache/conftool/dbconfig/20250416-211634-fceratto.json
21:16 ecarg@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
21:15 ecarg@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
21:14 ecarg@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
21:13 bking@cumin2002: START - Cookbook sre.dns.netbox
21:13 ecarg@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
21:13 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2079 to cirrussearch2079
21:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2077.codfw.wmnet with OS bullseye
21:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T391056)', diff saved to https://phabricator.wikimedia.org/P75159 and previous config saved to /var/cache/conftool/dbconfig/20250416-210128-fceratto.json
20:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T391056)', diff saved to https://phabricator.wikimedia.org/P75158 and previous config saved to /var/cache/conftool/dbconfig/20250416-205907-fceratto.json
20:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2204.codfw.wmnet with reason: Maintenance
20:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2077.codfw.wmnet with reason: host reimage
20:50 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2197.codfw.wmnet with reason: Maintenance
20:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T391056)', diff saved to https://phabricator.wikimedia.org/P75157 and previous config saved to /var/cache/conftool/dbconfig/20250416-204957-fceratto.json
20:48 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2077.codfw.wmnet with reason: host reimage
20:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P75156 and previous config saved to /var/cache/conftool/dbconfig/20250416-203450-fceratto.json
20:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2077
20:34 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2077
20:33 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2077
20:33 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2077.codfw.wmnet 125.16.192.10.in-addr.arpa 5.2.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
20:33 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2077.codfw.wmnet 125.16.192.10.in-addr.arpa 5.2.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
20:33 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:33 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2077 - bking@cumin2002"
20:33 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2077 - bking@cumin2002"
20:28 bking@cumin2002: START - Cookbook sre.dns.netbox
20:28 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2077
20:28 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2077.codfw.wmnet with OS bullseye
20:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2077.codfw.wmnet on all recursors
20:27 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2077.codfw.wmnet on all recursors
20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2077 to cirrussearch2077
20:26 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2077
20:26 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2077
20:26 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:26 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2077 to cirrussearch2077 - bking@cumin2002"
20:25 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2077 to cirrussearch2077 - bking@cumin2002"
20:20 bking@cumin2002: START - Cookbook sre.dns.netbox
20:20 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2077 to cirrussearch2077
20:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P75155 and previous config saved to /var/cache/conftool/dbconfig/20250416-201943-fceratto.json
20:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2063.codfw.wmnet with OS bullseye
20:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T391056)', diff saved to https://phabricator.wikimedia.org/P75154 and previous config saved to /var/cache/conftool/dbconfig/20250416-200437-fceratto.json
19:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T391056)', diff saved to https://phabricator.wikimedia.org/P75153 and previous config saved to /var/cache/conftool/dbconfig/20250416-195408-fceratto.json
19:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2189.codfw.wmnet with reason: Maintenance
19:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T391056)', diff saved to https://phabricator.wikimedia.org/P75152 and previous config saved to /var/cache/conftool/dbconfig/20250416-195345-fceratto.json
19:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2063.codfw.wmnet with reason: host reimage
19:45 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2063.codfw.wmnet with reason: host reimage
19:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P75151 and previous config saved to /var/cache/conftool/dbconfig/20250416-193838-fceratto.json
19:34 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
19:33 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
19:30 swfrench@deploy1003: Stopping before sync operations
19:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2063
19:30 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2063
19:30 swfrench@deploy1003: Started scap sync-world: Test stop-before-sync scap run to pick up make-container-image changes - T390251
19:30 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2063
19:30 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2063.codfw.wmnet 108.16.192.10.in-addr.arpa 8.0.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
19:30 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2063.codfw.wmnet 108.16.192.10.in-addr.arpa 8.0.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
19:30 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:30 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2063 - bking@cumin2002"
19:30 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2063 - bking@cumin2002"
19:25 bking@cumin2002: START - Cookbook sre.dns.netbox
19:25 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2063
19:25 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2063.codfw.wmnet with OS bullseye
19:24 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2063.codfw.wmnet on all recursors
19:24 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2063.codfw.wmnet on all recursors
19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2063 to cirrussearch2063
19:23 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2063
19:23 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2063
19:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P75150 and previous config saved to /var/cache/conftool/dbconfig/20250416-192330-fceratto.json
19:23 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:23 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2063 to cirrussearch2063 - bking@cumin2002"
19:23 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2063 to cirrussearch2063 - bking@cumin2002"
19:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T391056)', diff saved to https://phabricator.wikimedia.org/P75149 and previous config saved to /var/cache/conftool/dbconfig/20250416-190823-fceratto.json
19:06 bking@cumin2002: START - Cookbook sre.dns.netbox
19:06 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2063 to cirrussearch2063
18:56 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T391056)', diff saved to https://phabricator.wikimedia.org/P75148 and previous config saved to /var/cache/conftool/dbconfig/20250416-185651-fceratto.json
18:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2175.codfw.wmnet with reason: Maintenance
18:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T391056)', diff saved to https://phabricator.wikimedia.org/P75147 and previous config saved to /var/cache/conftool/dbconfig/20250416-185628-fceratto.json
18:44 sukhe: re-enable puppet on A:durum
18:42 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.25 refs T386220
18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum3003.esams.wmnet with OS bookworm
18:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P75146 and previous config saved to /var/cache/conftool/dbconfig/20250416-184121-fceratto.json
18:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P75145 and previous config saved to /var/cache/conftool/dbconfig/20250416-182613-fceratto.json
18:22 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3003.esams.wmnet with reason: host reimage
18:19 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3003.esams.wmnet with reason: host reimage
18:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T391056)', diff saved to https://phabricator.wikimedia.org/P75144 and previous config saved to /var/cache/conftool/dbconfig/20250416-181105-fceratto.json
18:08 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
18:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2070.codfw.wmnet with OS bullseye
17:58 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T391056)', diff saved to https://phabricator.wikimedia.org/P75142 and previous config saved to /var/cache/conftool/dbconfig/20250416-175842-fceratto.json
17:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2148.codfw.wmnet with reason: Maintenance
17:55 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host durum3003.esams.wmnet with OS bookworm
17:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
17:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T391056)', diff saved to https://phabricator.wikimedia.org/P75140 and previous config saved to /var/cache/conftool/dbconfig/20250416-174828-fceratto.json
17:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2070.codfw.wmnet with reason: host reimage
17:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2070.codfw.wmnet with reason: host reimage
17:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P75139 and previous config saved to /var/cache/conftool/dbconfig/20250416-173320-fceratto.json
17:33 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
17:33 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
17:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P75138 and previous config saved to /var/cache/conftool/dbconfig/20250416-171813-fceratto.json
17:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T391056)', diff saved to https://phabricator.wikimedia.org/P75137 and previous config saved to /var/cache/conftool/dbconfig/20250416-170305-fceratto.json
17:00 cgoubert@deploy1003: Finished scap sync-world: Deploy mediawiki chart 0.8.11 (duration: 03m 02s)
16:58 cgoubert@deploy1003: Started scap sync-world: Deploy mediawiki chart 0.8.11
16:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1254 (T391056)', diff saved to https://phabricator.wikimedia.org/P75136 and previous config saved to /var/cache/conftool/dbconfig/20250416-165118-fceratto.json
16:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1254.eqiad.wmnet with reason: Maintenance
16:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1239.eqiad.wmnet with reason: Maintenance
16:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T391056)', diff saved to https://phabricator.wikimedia.org/P75135 and previous config saved to /var/cache/conftool/dbconfig/20250416-164216-fceratto.json
16:37 kevinbazira@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
16:36 kevinbazira@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams: apply
16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2070
16:36 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2070
16:36 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2070
16:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2070.codfw.wmnet 110.16.192.10.in-addr.arpa 0.1.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:36 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2070.codfw.wmnet 110.16.192.10.in-addr.arpa 0.1.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:33 bking@cumin2002: START - Cookbook sre.dns.netbox
16:33 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2070
16:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2070.codfw.wmnet with OS bullseye
16:32 kevinbazira@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
16:32 kevinbazira@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
16:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P75133 and previous config saved to /var/cache/conftool/dbconfig/20250416-162709-fceratto.json
16:23 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
16:22 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2110.codfw.wmnet on all recursors
16:22 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2110.codfw.wmnet on all recursors
16:22 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2095.codfw.wmnet on all recursors
16:22 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2095.codfw.wmnet on all recursors
16:21 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2070.codfw.wmnet with OS bullseye
16:21 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.move-vlan (exit_code=93) for host cirrussearch2070
16:21 bking@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
16:20 cgoubert@deploy1003: Finished scap sync-world: Deploy mediawiki chart 0.8.10 (duration: 03m 20s)
16:18 bking@cumin2002: START - Cookbook sre.dns.netbox
16:18 cgoubert@deploy1003: Started scap sync-world: Deploy mediawiki chart 0.8.10
16:18 bking@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cirrussearch2070
16:18 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2070
16:18 bking@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cirrussearch2070
16:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P75132 and previous config saved to /var/cache/conftool/dbconfig/20250416-161202-fceratto.json
16:11 sukhe@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3003.esams.wmnet with reason: testing ECH
16:10 sukhe: stopping bird on durum3003 to temporarily disable advertising of anycast IPs
16:08 sukhe: sudo cumin 'A:durum' 'disable-puppet "rolling out CR 1136772"'
16:07 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2070
16:07 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2070.codfw.wmnet 110.16.192.10.in-addr.arpa 0.1.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:07 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2070.codfw.wmnet 110.16.192.10.in-addr.arpa 0.1.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:07 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:07 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2070 - bking@cumin2002"
16:07 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2070 - bking@cumin2002"
16:07 kevinbazira@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
16:07 kevinbazira@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: apply
16:01 bking@cumin2002: START - Cookbook sre.dns.netbox
16:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2070
16:00 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2070.codfw.wmnet with OS bullseye
15:59 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2070.codfw.wmnet on all recursors
15:59 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2070.codfw.wmnet on all recursors
15:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2070 to cirrussearch2070
15:58 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2070
15:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2070
15:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2070 to cirrussearch2070 - bking@cumin2002"
15:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T391056)', diff saved to https://phabricator.wikimedia.org/P75129 and previous config saved to /var/cache/conftool/dbconfig/20250416-155655-fceratto.json
15:51 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2070 to cirrussearch2070 - bking@cumin2002"
15:46 bking@cumin2002: START - Cookbook sre.dns.netbox
15:46 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2070 to cirrussearch2070
15:45 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T391056)', diff saved to https://phabricator.wikimedia.org/P75128 and previous config saved to /var/cache/conftool/dbconfig/20250416-154515-fceratto.json
15:45 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
15:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1233.eqiad.wmnet with reason: Maintenance
15:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T391056)', diff saved to https://phabricator.wikimedia.org/P75127 and previous config saved to /var/cache/conftool/dbconfig/20250416-154452-fceratto.json
15:32 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
15:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P75126 and previous config saved to /var/cache/conftool/dbconfig/20250416-152945-fceratto.json
15:17 sukhe@dns1004: END - running authdns-update
15:14 sukhe@dns1004: START - running authdns-update
15:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P75125 and previous config saved to /var/cache/conftool/dbconfig/20250416-151438-fceratto.json
14:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T391056)', diff saved to https://phabricator.wikimedia.org/P75124 and previous config saved to /var/cache/conftool/dbconfig/20250416-145928-fceratto.json
14:57 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T391056)', diff saved to https://phabricator.wikimedia.org/P75123 and previous config saved to /var/cache/conftool/dbconfig/20250416-145718-fceratto.json
14:57 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1229.eqiad.wmnet with reason: Maintenance
14:53 kamila@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
14:52 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
14:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1225.eqiad.wmnet with reason: Maintenance
14:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T391056)', diff saved to https://phabricator.wikimedia.org/P75122 and previous config saved to /var/cache/conftool/dbconfig/20250416-144750-fceratto.json
14:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
14:44 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
14:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P75121 and previous config saved to /var/cache/conftool/dbconfig/20250416-143242-fceratto.json
14:29 sukhe@dns1004: END - running authdns-update
14:27 sukhe@dns1004: START - running authdns-update
14:26 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
14:26 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
14:22 sukhe: reprepro -C component/nginx-ech include bookworm-wikimedia nginx_1.22.1-9+deb12u1+ech3_amd64.changes: T205378
14:18 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
14:17 brouberol@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2101.codfw.wmnet on all recursors
14:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P75120 and previous config saved to /var/cache/conftool/dbconfig/20250416-141735-fceratto.json
14:17 brouberol@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2101.codfw.wmnet on all recursors
14:17 brouberol@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2099.codfw.wmnet on all recursors
14:17 brouberol@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2099.codfw.wmnet on all recursors
14:17 brouberol@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2071.codfw.wmnet on all recursors
14:17 brouberol@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2071.codfw.wmnet on all recursors
14:16 brouberol@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - brouberol@cumin2002 - T388610
14:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T391056)', diff saved to https://phabricator.wikimedia.org/P75119 and previous config saved to /var/cache/conftool/dbconfig/20250416-140228-fceratto.json
13:55 Lucas_WMDE: UTC afternoon backport+config window done
13:55 eevans@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1045.eqiad.wmnet with reason: Bootstrapping — T389423
13:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
13:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
13:53 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Release campaignEvents extension to azwiki (T390805) (duration: 19m 09s)
13:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
13:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
13:51 jelto: "Imported helm311 3.11.3-4 to bullseye-wikimedia and bookworm-wikimedia - T387548"
13:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T391056)', diff saved to https://phabricator.wikimedia.org/P75118 and previous config saved to /var/cache/conftool/dbconfig/20250416-135121-fceratto.json
13:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1222.eqiad.wmnet with reason: Maintenance
13:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T391056)', diff saved to https://phabricator.wikimedia.org/P75117 and previous config saved to /var/cache/conftool/dbconfig/20250416-135059-fceratto.json
13:47 lucaswerkmeister-wmde@deploy1003: mhorsey, lucaswerkmeister-wmde: Continuing with sync
13:44 lucaswerkmeister-wmde@deploy1003: mhorsey, lucaswerkmeister-wmde: Backport for Release campaignEvents extension to azwiki (T390805) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P75116 and previous config saved to /var/cache/conftool/dbconfig/20250416-133552-fceratto.json
13:34 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Release campaignEvents extension to azwiki (T390805)
13:28 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for search-redirect: fix case-sensitivity of project name (T391297) (duration: 22m 55s)
13:24 godog: finish rollout of thanos 0.38 to prometheus* - T383966
13:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P75115 and previous config saved to /var/cache/conftool/dbconfig/20250416-132043-fceratto.json
13:20 lucaswerkmeister-wmde@deploy1003: wargo, lucaswerkmeister-wmde: Continuing with sync
13:18 godog: bounce thanos on titan100* - overload
13:17 lucaswerkmeister-wmde@deploy1003: wargo, lucaswerkmeister-wmde: Backport for search-redirect: fix case-sensitivity of project name (T391297) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:06 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for search-redirect: fix case-sensitivity of project name (T391297)
13:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T391056)', diff saved to https://phabricator.wikimedia.org/P75114 and previous config saved to /var/cache/conftool/dbconfig/20250416-130536-fceratto.json
13:03 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T391056)', diff saved to https://phabricator.wikimedia.org/P75113 and previous config saved to /var/cache/conftool/dbconfig/20250416-130326-fceratto.json
13:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1197.eqiad.wmnet with reason: Maintenance
13:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T391056)', diff saved to https://phabricator.wikimedia.org/P75112 and previous config saved to /var/cache/conftool/dbconfig/20250416-130303-fceratto.json
12:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P75111 and previous config saved to /var/cache/conftool/dbconfig/20250416-124755-fceratto.json
12:37 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aphlict2001.codfw.wmnet with OS bookworm
12:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P75109 and previous config saved to /var/cache/conftool/dbconfig/20250416-123248-fceratto.json
12:23 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
12:23 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
12:17 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
12:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T391056)', diff saved to https://phabricator.wikimedia.org/P75108 and previous config saved to /var/cache/conftool/dbconfig/20250416-121742-fceratto.json
12:17 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
12:15 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T391056)', diff saved to https://phabricator.wikimedia.org/P75107 and previous config saved to /var/cache/conftool/dbconfig/20250416-121532-fceratto.json
12:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: Maintenance
12:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T391056)', diff saved to https://phabricator.wikimedia.org/P75106 and previous config saved to /var/cache/conftool/dbconfig/20250416-121509-fceratto.json
12:14 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
12:14 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
12:13 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
12:13 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
12:11 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aphlict2001.codfw.wmnet with reason: host reimage
12:08 aokoth@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aphlict2001.codfw.wmnet with reason: host reimage
12:00 cgoubert@deploy1003: Finished scap sync-world: Move mwscript wrapper from base image to copy on build - T391665 (duration: 50m 43s)
12:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P75104 and previous config saved to /var/cache/conftool/dbconfig/20250416-120002-fceratto.json
11:57 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
11:57 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
11:52 aokoth@cumin1002: START - Cookbook sre.hosts.reimage for host aphlict2001.codfw.wmnet with OS bookworm
11:52 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
11:51 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
11:51 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on aphlict2001.codfw.wmnet with reason: Bookworm Re-image
11:51 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
11:51 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
11:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P75103 and previous config saved to /var/cache/conftool/dbconfig/20250416-114455-fceratto.json
11:41 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
11:41 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
11:37 jelto: temporarily disable query sites on miscweb vms - T350793
11:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T391056)', diff saved to https://phabricator.wikimedia.org/P75102 and previous config saved to /var/cache/conftool/dbconfig/20250416-112948-fceratto.json
11:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T391056)', diff saved to https://phabricator.wikimedia.org/P75101 and previous config saved to /var/cache/conftool/dbconfig/20250416-111822-fceratto.json
11:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: Maintenance
11:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T391056)', diff saved to https://phabricator.wikimedia.org/P75100 and previous config saved to /var/cache/conftool/dbconfig/20250416-111759-fceratto.json
11:11 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:10 cgoubert@deploy1003: Started scap sync-world: Move mwscript wrapper from base image to copy on build - T391665
11:09 cmooney@cumin1002: START - Cookbook sre.dns.netbox
11:06 claime: Rebuilding php base images to pick up 1135922 - T391665
11:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P75099 and previous config saved to /var/cache/conftool/dbconfig/20250416-110252-fceratto.json
10:58 cgoubert@deploy1003: Finished scap build-images: (no justification provided) (duration: 05m 36s)
10:52 cgoubert@deploy1003: Started scap build-images: (no justification provided)
10:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P75098 and previous config saved to /var/cache/conftool/dbconfig/20250416-104744-fceratto.json
10:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T391056)', diff saved to https://phabricator.wikimedia.org/P75097 and previous config saved to /var/cache/conftool/dbconfig/20250416-103236-fceratto.json
10:29 MichaelG_WMF: migr@mwmaint1002:/srv/mediawiki/php-1.44.0-wmf.24$ time mwscript ./extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki ruwiki --verbose #T391695
10:23 MichaelG_WMF: migr@mwmaint1002:/srv/mediawiki/php-1.44.0-wmf.24$ time mwscript ./extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki frwiki --verbose #T391695
10:21 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T391056)', diff saved to https://phabricator.wikimedia.org/P75096 and previous config saved to /var/cache/conftool/dbconfig/20250416-102110-fceratto.json
10:21 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
10:20 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: Maintenance
10:19 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database nupwiki (T390714)
10:19 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database nupwiki (T390714)
10:17 MichaelG_WMF: migr@mwmaint1002:/srv/mediawiki/php-1.44.0-wmf.25$ mwscript ./extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki testwiki --verbose #T391695
10:13 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:11 cmooney@cumin1002: START - Cookbook sre.dns.netbox
09:54 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet with OS bookworm
09:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for Change default thumbnail size to 250px (T355914) (duration: 19m 35s)
09:36 ladsgroup@deploy1003: ladsgroup: Continuing with sync
09:36 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: host reimage
09:35 ladsgroup@deploy1003: ladsgroup: Backport for Change default thumbnail size to 250px (T355914) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:32 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: host reimage
09:23 ladsgroup@deploy1003: Started scap sync-world: Backport for Change default thumbnail size to 250px (T355914)
09:22 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 100% (T360589) (duration: 19m 05s)
09:18 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve-ctrl1001.eqiad.wmnet with OS bookworm
09:15 vgutierrez: repooling cp4047 - T387238
09:15 ladsgroup@deploy1003: ladsgroup: Continuing with sync
09:15 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 100% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:02 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 100% (T360589)
09:02 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command '['helmfile', '-e', 'eqiad', '--selector', 'name=main', 'write-values', '--output-file-template', '/tmp/tmpsh_tee3p']' returned non-zero exit status 3. (scap version: 4.153.0) (duration: 15m 58s)
08:59 ladsgroup@deploy1003: ladsgroup: Continuing with sync
08:58 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 100% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:46 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 100% (T360589)
08:16 akosiaris: destroy the "main" helmfile releases for mw-wikifunctions. The service is now being powered by the single version MediaWiki HTTP routing solution releases, this is a cleanup.
07:50 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
07:26 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
07:26 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
07:02 elukey: powercycle ml-serve2007 - OEM event registered in getsel (seems DIMM-related)
06:09 volans: installing spicerack v10.1.0 on cumin1002
05:38 volans: installing spicerack v10.1.0 on cumin2002
02:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T391056)', diff saved to https://phabricator.wikimedia.org/P75094 and previous config saved to /var/cache/conftool/dbconfig/20250416-023052-fceratto.json
02:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P75093 and previous config saved to /var/cache/conftool/dbconfig/20250416-021544-fceratto.json
02:05 ejegg: payments-wiki upgraded from ba6e8d65 to 4ad609b4
02:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P75092 and previous config saved to /var/cache/conftool/dbconfig/20250416-020036-fceratto.json
01:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T391056)', diff saved to https://phabricator.wikimedia.org/P75091 and previous config saved to /var/cache/conftool/dbconfig/20250416-014529-fceratto.json
01:29 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2222 (T391056)', diff saved to https://phabricator.wikimedia.org/P75090 and previous config saved to /var/cache/conftool/dbconfig/20250416-012924-fceratto.json
01:29 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: Maintenance
01:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T391056)', diff saved to https://phabricator.wikimedia.org/P75089 and previous config saved to /var/cache/conftool/dbconfig/20250416-012901-fceratto.json
01:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P75088 and previous config saved to /var/cache/conftool/dbconfig/20250416-011353-fceratto.json
00:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P75087 and previous config saved to /var/cache/conftool/dbconfig/20250416-005846-fceratto.json
00:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T391056)', diff saved to https://phabricator.wikimedia.org/P75086 and previous config saved to /var/cache/conftool/dbconfig/20250416-004338-fceratto.json
00:27 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2221 (T391056)', diff saved to https://phabricator.wikimedia.org/P75085 and previous config saved to /var/cache/conftool/dbconfig/20250416-002725-fceratto.json
00:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2221.codfw.wmnet with reason: Maintenance
00:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T391056)', diff saved to https://phabricator.wikimedia.org/P75084 and previous config saved to /var/cache/conftool/dbconfig/20250416-002703-fceratto.json
00:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P75083 and previous config saved to /var/cache/conftool/dbconfig/20250416-001156-fceratto.json
00:02 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610

2025-04-15

23:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P75082 and previous config saved to /var/cache/conftool/dbconfig/20250415-235649-fceratto.json
23:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2112.codfw.wmnet with OS bullseye
23:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T391056)', diff saved to https://phabricator.wikimedia.org/P75081 and previous config saved to /var/cache/conftool/dbconfig/20250415-234142-fceratto.json
23:32 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2112.codfw.wmnet with reason: host reimage
23:27 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2112.codfw.wmnet with reason: host reimage
23:25 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2220 (T391056)', diff saved to https://phabricator.wikimedia.org/P75080 and previous config saved to /var/cache/conftool/dbconfig/20250415-232535-fceratto.json
23:25 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: Maintenance
23:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T391056)', diff saved to https://phabricator.wikimedia.org/P75079 and previous config saved to /var/cache/conftool/dbconfig/20250415-232511-fceratto.json
23:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2112
23:11 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2112
23:11 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2112.codfw.wmnet with OS bullseye
23:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P75078 and previous config saved to /var/cache/conftool/dbconfig/20250415-231003-fceratto.json
23:10 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2112.codfw.wmnet on all recursors
23:10 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2112.codfw.wmnet on all recursors
23:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2112 to cirrussearch2112
23:09 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2112
23:09 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2112
23:09 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2112 to cirrussearch2112 - bking@cumin2002"
22:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P75077 and previous config saved to /var/cache/conftool/dbconfig/20250415-225456-fceratto.json
22:52 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2112 to cirrussearch2112 - bking@cumin2002"
22:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T391056)', diff saved to https://phabricator.wikimedia.org/P75076 and previous config saved to /var/cache/conftool/dbconfig/20250415-223949-fceratto.json
22:35 bking@cumin2002: START - Cookbook sre.dns.netbox
22:35 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2112 to cirrussearch2112
22:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T391056)', diff saved to https://phabricator.wikimedia.org/P75075 and previous config saved to /var/cache/conftool/dbconfig/20250415-222316-fceratto.json
22:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: Maintenance
22:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2103.codfw.wmnet with OS bullseye
22:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2200.codfw.wmnet with reason: Maintenance
21:57 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2198.codfw.wmnet with reason: Maintenance
21:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T391056)', diff saved to https://phabricator.wikimedia.org/P75074 and previous config saved to /var/cache/conftool/dbconfig/20250415-215714-fceratto.json
21:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1180.eqiad.wmnet with OS bullseye
21:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2103.codfw.wmnet with reason: host reimage
21:46 urandom: bootstrapping Cassandra/restbase1045-{a,b,c} — T389423
21:44 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2103.codfw.wmnet with reason: host reimage
21:42 eevans@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1045.eqiad.wmnet with reason: Bootstrapping — T389423
21:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P75073 and previous config saved to /var/cache/conftool/dbconfig/20250415-214206-fceratto.json
{{safesubst:SAL entry|1=21:41 jforrester@deploy1003: Finished scap sync-world: Backport for FetchHandler: Disable on non-repo wikis (T392014), FetchHandler: Don't read from the DB in getParamSettings on non-repo wikis either (T392014), FetchHandler: Disable on non-repo wikis (T392014), [[gerrit:1136808|FetchHandler: Don't read from the DB in getParamSettings on non-repo wikis either (T392014}}
21:27 jforrester@deploy1003: jforrester: Continuing with sync
21:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2103
21:27 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2103
21:27 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2103
21:27 jforrester@deploy1003: jforrester: Backport for FetchHandler: Disable on non-repo wikis (T392014), FetchHandler: Don't read from the DB in getParamSettings on non-repo wikis either (T392014), FetchHandler: Disable on non-repo wikis (T392014), FetchHandler: Don't read from the DB in getParamSettings on non-repo wikis either (T392014) synced to
21:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2103.codfw.wmnet 222.32.192.10.in-addr.arpa 2.2.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
21:27 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2103.codfw.wmnet 222.32.192.10.in-addr.arpa 2.2.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
21:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:27 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2103 - bking@cumin2002"
21:27 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2103 - bking@cumin2002"
21:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P75072 and previous config saved to /var/cache/conftool/dbconfig/20250415-212659-fceratto.json
21:23 bking@cumin2002: START - Cookbook sre.dns.netbox
21:22 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2103
21:22 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2103.codfw.wmnet with OS bullseye
21:22 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2103.codfw.wmnet on all recursors
21:22 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2103.codfw.wmnet on all recursors
21:22 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2103 to cirrussearch2103
21:21 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2103
21:21 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2103
21:21 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:21 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2103 to cirrussearch2103 - bking@cumin2002"
21:20 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2103 to cirrussearch2103 - bking@cumin2002"
21:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T391056)', diff saved to https://phabricator.wikimedia.org/P75071 and previous config saved to /var/cache/conftool/dbconfig/20250415-211152-fceratto.json
21:07 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1180.eqiad.wmnet with OS bullseye
21:05 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1180.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
20:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T391056)', diff saved to https://phabricator.wikimedia.org/P75070 and previous config saved to /var/cache/conftool/dbconfig/20250415-205427-fceratto.json
20:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: Maintenance
20:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T391056)', diff saved to https://phabricator.wikimedia.org/P75069 and previous config saved to /var/cache/conftool/dbconfig/20250415-205416-fceratto.json
20:53 bking@cumin2002: START - Cookbook sre.dns.netbox
20:53 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2103 to cirrussearch2103
20:51 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1180.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
{{safesubst:SAL entry|1=20:48 jforrester@deploy1003: Started scap sync-world: Backport for FetchHandler: Disable on non-repo wikis (T392014), FetchHandler: Don't read from the DB in getParamSettings on non-repo wikis either (T392014), FetchHandler: Disable on non-repo wikis (T392014), [[gerrit:1136808|FetchHandler: Don't read from the DB in getParamSettings on non-repo wikis either (T392014)}}
20:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P75068 and previous config saved to /var/cache/conftool/dbconfig/20250415-203909-fceratto.json
20:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2082.codfw.wmnet with OS bullseye
20:27 volans: uploaded spicerack_10.1.0 to apt.wikimedia.org bullseye-wikimedia
20:24 jforrester@deploy1003: Finished scap sync-world: Backport for wikimaniawiki: fix add/remove groups (T389729) (duration: 21m 04s)
20:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P75067 and previous config saved to /var/cache/conftool/dbconfig/20250415-202401-fceratto.json
20:17 jforrester@deploy1003: robertsky, jforrester: Continuing with sync
20:15 jforrester@deploy1003: robertsky, jforrester: Backport for wikimaniawiki: fix add/remove groups (T389729) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2082.codfw.wmnet with reason: host reimage
20:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T391056)', diff saved to https://phabricator.wikimedia.org/P75066 and previous config saved to /var/cache/conftool/dbconfig/20250415-200855-fceratto.json
20:07 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2082.codfw.wmnet with reason: host reimage
20:03 jforrester@deploy1003: Started scap sync-world: Backport for wikimaniawiki: fix add/remove groups (T389729)
19:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2082
19:52 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2082
19:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T391056)', diff saved to https://phabricator.wikimedia.org/P75065 and previous config saved to /var/cache/conftool/dbconfig/20250415-195157-fceratto.json
19:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: Maintenance
19:51 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2082
19:51 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2082.codfw.wmnet 87.32.192.10.in-addr.arpa 7.8.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
19:51 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2082.codfw.wmnet 87.32.192.10.in-addr.arpa 7.8.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
19:51 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:51 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2082 - bking@cumin2002"
19:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T391056)', diff saved to https://phabricator.wikimedia.org/P75064 and previous config saved to /var/cache/conftool/dbconfig/20250415-195134-fceratto.json
19:51 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2082 - bking@cumin2002"
19:47 bking@cumin2002: START - Cookbook sre.dns.netbox
19:46 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2082
19:46 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet
19:46 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2082.codfw.wmnet with OS bullseye
19:46 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2082.codfw.wmnet on all recursors
19:46 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2082.codfw.wmnet on all recursors
19:46 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2082 to cirrussearch2082
19:45 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2082
19:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P75063 and previous config saved to /var/cache/conftool/dbconfig/20250415-193627-fceratto.json
{{safesubst:SAL entry|1=19:35 jforrester@deploy1003: Finished scap sync-world: Backport for VE: Start setting wgVisualEditorMobileInsertMenu, default to off (T388604), VE: Set wgVisualEditorMobileInsertMenu true on Wikifunctions client wikis (T383145 T388604), [wikifunctionswiki] Enable Wikifunctions client mode (T383106), [[gerrit:1126662|[dagwiki] Enable Wikifunctions client mode (T383106)}}
19:31 jforrester@deploy1003: jforrester: Continuing with sync
19:25 jforrester@deploy1003: jforrester: Backport for VE: Start setting wgVisualEditorMobileInsertMenu, default to off (T388604), VE: Set wgVisualEditorMobileInsertMenu true on Wikifunctions client wikis (T383145 T388604), [wikifunctionswiki] Enable Wikifunctions client mode (T383106), [dagwiki] Enable Wikifunctions client mode (T383106) synced to t
19:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P75062 and previous config saved to /var/cache/conftool/dbconfig/20250415-192120-fceratto.json
{{safesubst:SAL entry|1=19:14 jforrester@deploy1003: Started scap sync-world: Backport for VE: Start setting wgVisualEditorMobileInsertMenu, default to off (T388604), VE: Set wgVisualEditorMobileInsertMenu true on Wikifunctions client wikis (T383145 T388604), [wikifunctionswiki] Enable Wikifunctions client mode (T383106), [[gerrit:1126662|[dagwiki] Enable Wikifunctions client mode (T383106)]}}
19:10 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2082
19:10 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:10 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2082 to cirrussearch2082 - bking@cumin2002"
19:10 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2082 to cirrussearch2082 - bking@cumin2002"
19:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T391056)', diff saved to https://phabricator.wikimedia.org/P75061 and previous config saved to /var/cache/conftool/dbconfig/20250415-190613-fceratto.json
19:05 bking@cumin2002: START - Cookbook sre.dns.netbox
19:05 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2082 to cirrussearch2082
19:03 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
18:50 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T391056)', diff saved to https://phabricator.wikimedia.org/P75060 and previous config saved to /var/cache/conftool/dbconfig/20250415-185000-fceratto.json
18:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2187.codfw.wmnet with reason: Maintenance
18:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: Maintenance
18:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T391056)', diff saved to https://phabricator.wikimedia.org/P75059 and previous config saved to /var/cache/conftool/dbconfig/20250415-184921-fceratto.json
18:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P75058 and previous config saved to /var/cache/conftool/dbconfig/20250415-183413-fceratto.json
18:29 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.25 refs T386220
18:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P75057 and previous config saved to /var/cache/conftool/dbconfig/20250415-181906-fceratto.json
18:05 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
18:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T391056)', diff saved to https://phabricator.wikimedia.org/P75056 and previous config saved to /var/cache/conftool/dbconfig/20250415-180400-fceratto.json
18:01 sukhe: removing from reprepro -C component/nginx-ech libssl and openssl packages
18:00 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_eqiad and A:cp
17:57 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_eqiad and A:cp
17:47 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T391056)', diff saved to https://phabricator.wikimedia.org/P75055 and previous config saved to /var/cache/conftool/dbconfig/20250415-174734-fceratto.json
17:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: Maintenance
17:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
17:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T391056)', diff saved to https://phabricator.wikimedia.org/P75054 and previous config saved to /var/cache/conftool/dbconfig/20250415-174653-fceratto.json
17:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P75053 and previous config saved to /var/cache/conftool/dbconfig/20250415-173146-fceratto.json
17:24 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@f650091]: Pickup latest artifacts. T391280. (duration: 01m 08s)
17:23 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@f650091]: Pickup latest artifacts. T391280.
17:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P75052 and previous config saved to /var/cache/conftool/dbconfig/20250415-171639-fceratto.json
17:14 sukhe@dns1004: END - running authdns-update
17:11 sukhe@dns1004: START - running authdns-update
17:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T391056)', diff saved to https://phabricator.wikimedia.org/P75051 and previous config saved to /var/cache/conftool/dbconfig/20250415-170132-fceratto.json
16:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1253 (T391056)', diff saved to https://phabricator.wikimedia.org/P75050 and previous config saved to /var/cache/conftool/dbconfig/20250415-165922-fceratto.json
16:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: Maintenance
16:59 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
16:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T391056)', diff saved to https://phabricator.wikimedia.org/P75049 and previous config saved to /var/cache/conftool/dbconfig/20250415-165859-fceratto.json
16:58 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
16:48 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
16:46 sukhe@dns1004: END - running authdns-update
16:46 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
16:45 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2102.codfw.wmnet on all recursors
16:45 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2102.codfw.wmnet on all recursors
16:45 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2098.codfw.wmnet on all recursors
16:45 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2098.codfw.wmnet on all recursors
16:45 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from elastic2098 to cirrussearch2098
16:45 bking@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
16:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P75048 and previous config saved to /var/cache/conftool/dbconfig/20250415-164350-fceratto.json
16:43 sukhe@dns1004: START - running authdns-update
16:42 bking@cumin2002: START - Cookbook sre.dns.netbox
16:42 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2098 to cirrussearch2098
16:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P75047 and previous config saved to /var/cache/conftool/dbconfig/20250415-162842-fceratto.json
16:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2059.codfw.wmnet with OS bullseye
16:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T391056)', diff saved to https://phabricator.wikimedia.org/P75046 and previous config saved to /var/cache/conftool/dbconfig/20250415-161335-fceratto.json
16:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2059.codfw.wmnet with reason: host reimage
16:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2059.codfw.wmnet with reason: host reimage
15:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T391056)', diff saved to https://phabricator.wikimedia.org/P75044 and previous config saved to /var/cache/conftool/dbconfig/20250415-155939-fceratto.json
15:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: Maintenance
15:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T391056)', diff saved to https://phabricator.wikimedia.org/P75043 and previous config saved to /var/cache/conftool/dbconfig/20250415-155914-fceratto.json
15:58 tappof@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add data::pdus to exports - tappof@cumin1002 - T387231"
15:57 tappof@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add data::pdus to exports - tappof@cumin1002 - T387231"
15:47 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2059
15:47 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2059
15:47 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2059
15:47 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2059.codfw.wmnet 5.32.192.10.in-addr.arpa 5.0.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:47 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2059.codfw.wmnet 5.32.192.10.in-addr.arpa 5.0.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:47 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:47 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2059 - bking@cumin2002"
15:47 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2059 - bking@cumin2002"
15:45 ladsgroup@deploy1003: Finished scap sync-world: Backport for Revert^2 "Bump thumbnail steps to 95%" (duration: 21m 02s)
15:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P75042 and previous config saved to /var/cache/conftool/dbconfig/20250415-154407-fceratto.json
15:42 bking@cumin2002: START - Cookbook sre.dns.netbox
15:42 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2059
15:42 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2059.codfw.wmnet with OS bullseye
15:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2059.codfw.wmnet on all recursors
15:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2059.codfw.wmnet on all recursors
15:41 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2059 to cirrussearch2059
15:41 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2059
15:40 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2059
15:40 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:40 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2059 to cirrussearch2059 - bking@cumin2002"
15:39 ladsgroup@deploy1003: ladsgroup: Continuing with sync
15:36 ladsgroup@deploy1003: ladsgroup: Backport for Revert^2 "Bump thumbnail steps to 95%" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P75041 and previous config saved to /var/cache/conftool/dbconfig/20250415-152901-fceratto.json
15:24 ladsgroup@deploy1003: Started scap sync-world: Backport for Revert^2 "Bump thumbnail steps to 95%"
15:22 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2059 to cirrussearch2059 - bking@cumin2002"
15:17 dzahn@deploy1003: Finished deploy [releng/jenkins-deploy@c274545] (releasing): T391590 (duration: 01m 14s)
15:16 dzahn@deploy1003: Started deploy [releng/jenkins-deploy@c274545] (releasing): T391590
15:16 bking@cumin2002: START - Cookbook sre.dns.netbox
15:16 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2059 to cirrussearch2059
15:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T391056)', diff saved to https://phabricator.wikimedia.org/P75038 and previous config saved to /var/cache/conftool/dbconfig/20250415-151354-fceratto.json
15:11 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T391056)', diff saved to https://phabricator.wikimedia.org/P75037 and previous config saved to /var/cache/conftool/dbconfig/20250415-151144-fceratto.json
15:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: Maintenance
15:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T391056)', diff saved to https://phabricator.wikimedia.org/P75036 and previous config saved to /var/cache/conftool/dbconfig/20250415-151121-fceratto.json
14:57 sbassett: Undeployed security patch for T391343 (reapplied during recent scap backport, patch now removed from deployment hosts)
14:57 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
14:57 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
14:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P75035 and previous config saved to /var/cache/conftool/dbconfig/20250415-145613-fceratto.json
14:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P75034 and previous config saved to /var/cache/conftool/dbconfig/20250415-144106-fceratto.json
14:40 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
14:39 cgoubert@deploy1003: Finished scap sync-world: Backport for shwiktionary: Add bs as import source (T391621) (duration: 19m 28s)
14:39 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
14:38 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
14:38 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_drmrs
14:33 cgoubert@deploy1003: aleksandar, cgoubert: Continuing with sync
14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_drmrs
14:31 cgoubert@deploy1003: aleksandar, cgoubert: Backport for shwiktionary: Add bs as import source (T391621) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T391056)', diff saved to https://phabricator.wikimedia.org/P75033 and previous config saved to /var/cache/conftool/dbconfig/20250415-142558-fceratto.json
14:25 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_eqiad and A:cp
14:25 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_eqiad and A:cp
14:25 vgutierrez: rolling upgrade to varnish 7.1.1-1.1~bpo11+wmf3 in eqiad - T391334
14:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T391056)', diff saved to https://phabricator.wikimedia.org/P75032 and previous config saved to /var/cache/conftool/dbconfig/20250415-142349-fceratto.json
14:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: Maintenance
14:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T391056)', diff saved to https://phabricator.wikimedia.org/P75031 and previous config saved to /var/cache/conftool/dbconfig/20250415-142327-fceratto.json
14:22 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
14:22 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2065.codfw.wmnet with OS bullseye
14:20 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_esams and not P{cp3073.esams.wmnet} and A:cp
14:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:20 cgoubert@deploy1003: Started scap sync-world: Backport for shwiktionary: Add bs as import source (T391621)
14:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:18 cgoubert@deploy1003: Finished scap sync-world: Backport for tests(Mentorship): add coverage for UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): extract sub-queries from UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): batch filtering mentees in UncachedMenteeOverviewDataProvider (T391695) (duration: 18m 30s)
14:15 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_esams and not P{cp3081.esams.wmnet} and A:cp
14:11 cgoubert@deploy1003: migr, cgoubert: Continuing with sync
14:11 cgoubert@deploy1003: migr, cgoubert: Backport for tests(Mentorship): add coverage for UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): extract sub-queries from UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): batch filtering mentees in UncachedMenteeOverviewDataProvider (T391695) synced to the testservers (https://wikitech.wikimedia
14:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P75030 and previous config saved to /var/cache/conftool/dbconfig/20250415-140820-fceratto.json
14:07 urandom: bootstrapping Cassandra/restbase1044-c — T389423
14:04 sukhe@dns1004: END - running authdns-update
14:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2065.codfw.wmnet with reason: host reimage
14:01 sukhe@dns1004: START - running authdns-update
13:59 cgoubert@deploy1003: Started scap sync-world: Backport for tests(Mentorship): add coverage for UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): extract sub-queries from UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): batch filtering mentees in UncachedMenteeOverviewDataProvider (T391695)
13:59 sukhe@dns1004: END - running authdns-update
13:56 sukhe@dns1004: START - running authdns-update
13:56 cgoubert@deploy1003: Finished scap sync-world: Backport for tests(Mentorship): add coverage for UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): extract sub-queries from UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): batch filtering mentees in UncachedMenteeOverviewDataProvider (T391695) (duration: 18m 27s)
13:55 tappof@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add data::pdus to exports - tappof@cumin1002 - T387231"
13:55 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2065.codfw.wmnet with reason: host reimage
13:55 tappof@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add data::pdus to exports - tappof@cumin1002 - T387231"
13:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P75029 and previous config saved to /var/cache/conftool/dbconfig/20250415-135313-fceratto.json
13:49 cgoubert@deploy1003: migr, cgoubert: Continuing with sync
13:49 cgoubert@deploy1003: migr, cgoubert: Backport for tests(Mentorship): add coverage for UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): extract sub-queries from UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): batch filtering mentees in UncachedMenteeOverviewDataProvider (T391695) synced to the testservers (https://wikitech.wikimedia
13:45 tappof@cumin1002: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "add data::pdus to exports - tappof@cumin1002 - T387231"
13:45 tappof@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add data::pdus to exports - tappof@cumin1002 - T387231"
13:40 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2065
13:40 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2065
13:40 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2065
13:40 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2065.codfw.wmnet 68.32.192.10.in-addr.arpa 8.6.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
13:40 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2065.codfw.wmnet 68.32.192.10.in-addr.arpa 8.6.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
13:40 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:40 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2065 - bking@cumin2002"
13:40 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2065 - bking@cumin2002"
13:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T391056)', diff saved to https://phabricator.wikimedia.org/P75028 and previous config saved to /var/cache/conftool/dbconfig/20250415-133807-fceratto.json
13:38 cgoubert@deploy1003: Started scap sync-world: Backport for tests(Mentorship): add coverage for UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): extract sub-queries from UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): batch filtering mentees in UncachedMenteeOverviewDataProvider (T391695)
13:36 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T391056)', diff saved to https://phabricator.wikimedia.org/P75027 and previous config saved to /var/cache/conftool/dbconfig/20250415-133558-fceratto.json
13:35 bking@cumin2002: START - Cookbook sre.dns.netbox
13:35 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: Maintenance
13:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T391056)', diff saved to https://phabricator.wikimedia.org/P75025 and previous config saved to /var/cache/conftool/dbconfig/20250415-133536-fceratto.json
13:34 cgoubert@deploy1003: Finished scap sync-world: Backport for updating wikimaniawiki namespace configurations: (T389729), update wikimaniawiki perms configurations: (T389729) (duration: 28m 46s)
13:30 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2065
13:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2065.codfw.wmnet with OS bullseye
13:29 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2065.codfw.wmnet on all recursors
13:29 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2065.codfw.wmnet on all recursors
13:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2065 to cirrussearch2065
13:28 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2065
13:28 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2065
13:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:28 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2065 to cirrussearch2065 - bking@cumin2002"
13:28 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2065 to cirrussearch2065 - bking@cumin2002"
13:25 cgoubert@deploy1003: cgoubert, robertsky: Continuing with sync
13:23 bking@cumin2002: START - Cookbook sre.dns.netbox
13:23 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2065 to cirrussearch2065
13:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P75024 and previous config saved to /var/cache/conftool/dbconfig/20250415-132029-fceratto.json
13:17 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
13:17 cgoubert@deploy1003: cgoubert, robertsky: Backport for updating wikimaniawiki namespace configurations: (T389729), update wikimaniawiki perms configurations: (T389729) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:16 sukhe@dns1004: END - running authdns-update
13:14 slyngshede@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Andy Cooper out of all services on: 2393 hosts
13:13 sukhe@dns1004: START - running authdns-update
13:11 sukhe@dns1004: END - running authdns-update
13:09 sukhe@dns1004: START - running authdns-update
13:05 cgoubert@deploy1003: Started scap sync-world: Backport for updating wikimaniawiki namespace configurations: (T389729), update wikimaniawiki perms configurations: (T389729)
13:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P75023 and previous config saved to /var/cache/conftool/dbconfig/20250415-130522-fceratto.json
13:02 cgoubert@deploy1003: Finished scap sync-world: test rebuild to test swift eventual consistency (duration: 30m 09s)
13:02 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
13:02 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
13:02 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
13:02 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
13:02 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
13:01 jelto@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
12:55 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
12:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
12:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T391056)', diff saved to https://phabricator.wikimedia.org/P75022 and previous config saved to /var/cache/conftool/dbconfig/20250415-125014-fceratto.json
12:49 cgoubert@deploy1003: cgoubert: Continuing with sync
12:49 cgoubert@deploy1003: cgoubert: test rebuild to test swift eventual consistency synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T391056)', diff saved to https://phabricator.wikimedia.org/P75021 and previous config saved to /var/cache/conftool/dbconfig/20250415-124805-fceratto.json
12:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1181.eqiad.wmnet with reason: Maintenance
12:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T391056)', diff saved to https://phabricator.wikimedia.org/P75020 and previous config saved to /var/cache/conftool/dbconfig/20250415-124743-fceratto.json
12:42 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for durum2002.codfw.wmnet
12:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for durum2002.codfw.wmnet
12:33 slyngshede@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Andy Cooper out of all services on: 2393 hosts
12:33 cgoubert@deploy1003: Started scap sync-world: test rebuild to test swift eventual consistency
12:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P75018 and previous config saved to /var/cache/conftool/dbconfig/20250415-123236-fceratto.json
12:32 cgoubert@deploy1003: Finished scap build-images: (no justification provided) (duration: 05m 27s)
12:26 cgoubert@deploy1003: Started scap build-images: (no justification provided)
12:26 cgoubert@deploy1003: build-images aborted: (no justification provided) (duration: 00m 01s)
12:26 cgoubert@deploy1003: Started scap build-images: (no justification provided)
12:26 cgoubert@deploy1003: build-images aborted: (no justification provided) (duration: 01m 12s)
12:25 cgoubert@deploy1003: Started scap build-images: (no justification provided)
12:21 godog: upgrade thanos to 0.38.0 on O:prometheus::pop - T383966
12:20 godog: upgrade thanos to 0.38.0 on O:prometheus::pop
12:20 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum2002.codfw.wmnet with OS bookworm
12:19 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:18 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P75017 and previous config saved to /var/cache/conftool/dbconfig/20250415-121728-fceratto.json
12:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T391056)', diff saved to https://phabricator.wikimedia.org/P75016 and previous config saved to /var/cache/conftool/dbconfig/20250415-120222-fceratto.json
12:01 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
12:00 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T391056)', diff saved to https://phabricator.wikimedia.org/P75015 and previous config saved to /var/cache/conftool/dbconfig/20250415-120013-fceratto.json
12:00 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1174.eqiad.wmnet with reason: Maintenance
11:58 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
11:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: Maintenance
11:45 sukhe: sudo cumin 'A:durum and not P{durum2002*}' 'run-puppet-agent --enable "rolling out CR 1132669"'
11:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T391056)', diff saved to https://phabricator.wikimedia.org/P75014 and previous config saved to /var/cache/conftool/dbconfig/20250415-114501-fceratto.json
11:42 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host durum2002.codfw.wmnet with OS bookworm
11:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P75013 and previous config saved to /var/cache/conftool/dbconfig/20250415-112955-fceratto.json
11:25 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
11:25 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
11:25 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
11:25 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
11:24 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
11:24 jelto@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
11:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P75012 and previous config saved to /var/cache/conftool/dbconfig/20250415-111447-fceratto.json
11:08 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_esams and not P{cp3081.esams.wmnet} and A:cp
11:08 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_esams and not P{cp3073.esams.wmnet} and A:cp
11:07 vgutierrez: rolling upgrade to varnish 7.1.1-1.1~bpo11+wmf3 in esams - T391334
11:07 cgoubert@deploy1003: Started scap sync-world: test rebuild to look at logs
11:07 sukhe@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on durum2002.codfw.wmnet with reason: testing
11:05 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp[5023-5024].eqsin.wmnet} and A:cp
10:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T391056)', diff saved to https://phabricator.wikimedia.org/P75011 and previous config saved to /var/cache/conftool/dbconfig/20250415-105941-fceratto.json
10:58 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_eqsin
10:52 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_drmrs
10:52 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_drmrs
10:52 vgutierrez: rolling upgrade to varnish 7.1.1-1.1~bpo11+wmf3 in drmrs - T391334
10:42 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T391056)', diff saved to https://phabricator.wikimedia.org/P75010 and previous config saved to /var/cache/conftool/dbconfig/20250415-104235-fceratto.json
10:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: Maintenance
10:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T391056)', diff saved to https://phabricator.wikimedia.org/P75009 and previous config saved to /var/cache/conftool/dbconfig/20250415-104212-fceratto.json
10:41 sukhe: enable puppet on durum2002
10:40 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_codfw
10:39 ladsgroup@deploy1003: sync-world aborted: Backport for Bump thumbnail steps to 95% (T360589) (duration: 05m 08s)
10:38 sukhe: sudo cumin 'A:durum' 'disable-puppet "rolling out CR 1132669"'
10:37 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_codfw
10:34 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 95% (T360589)
10:33 ladsgroup@deploy1003: sync-world aborted: Backport for Bump thumbnail steps to 95% (T360589) (duration: 14m 11s)
10:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P75008 and previous config saved to /var/cache/conftool/dbconfig/20250415-102705-fceratto.json
10:26 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp[5023-5024].eqsin.wmnet} and A:cp
10:24 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=1) rolling upgrade of Varnish on A:cp-text_eqsin
10:19 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 95% (T360589)
10:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P75007 and previous config saved to /var/cache/conftool/dbconfig/20250415-101158-fceratto.json
10:00 dcausse@deploy1003: Finished deploy [wdqs/wdqs@fe88851] (wcqs): version 0.3.156 (duration: 02m 25s)
09:58 dcausse@deploy1003: Started deploy [wdqs/wdqs@fe88851] (wcqs): version 0.3.156
09:57 dcausse@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: version 0.3.156 (T326311) (duration: 14m 31s)
09:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T391056)', diff saved to https://phabricator.wikimedia.org/P75006 and previous config saved to /var/cache/conftool/dbconfig/20250415-095650-fceratto.json
09:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T391056)', diff saved to https://phabricator.wikimedia.org/P75005 and previous config saved to /var/cache/conftool/dbconfig/20250415-095442-fceratto.json
09:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
09:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: Maintenance
09:43 dcausse@deploy1003: Started deploy [wdqs/wdqs@fe88851]: version 0.3.156 (T326311)
09:15 jnuche@deploy1003: sync-world aborted: testwikis to 1.44.0-wmf.25 refs T386220 (duration: 14m 36s)
09:01 jnuche@deploy1003: Started scap sync-world: testwikis to 1.44.0-wmf.25 refs T386220
08:51 dcausse@deploy1003: Finished deploy [wdqs/wdqs@4186ae7] (wcqs): test deploy new scap config to wcqs2001.codfw.wmnet (T221709) (duration: 00m 20s)
08:51 dcausse@deploy1003: Started deploy [wdqs/wdqs@4186ae7] (wcqs): test deploy new scap config to wcqs2001.codfw.wmnet (T221709)
08:42 XioNoX: drain arelion eqsin-codfw link
08:09 dcausse@deploy1003: Finished deploy [wdqs/wdqs@4186ae7]: test deploy new scap config to wdqs2025.codfw.wmnet (T221709) (duration: 00m 18s)
08:09 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
08:09 dcausse@deploy1003: Started deploy [wdqs/wdqs@4186ae7]: test deploy new scap config to wdqs2025.codfw.wmnet (T221709)
08:08 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
07:47 godog: upgrade thanos to 0.38.0 on prometheus100[57] - T383966
07:28 Emperor: make sure all disks are mounted correctly prior to disk-swap testing T391854 ms-be1091
07:28 Emperor: make sure all disks are mounted correctly prior to disk-swap testing T391854
07:10 elukey@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ms-be1091.eqiad.wmnet with reason: dcops maintenance
07:06 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_codfw
07:06 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_codfw
07:06 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_eqsin
07:05 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
07:05 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_eqsin
07:04 vgutierrez: rolling upgrade to varnish 7.1.1-1.1~bpo11+wmf3 in eqsin and codfw - T391334
06:50 kartik@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: apply
06:49 kart_: Updated cxserver to 2025-04-07-053106-production (T390732, T390711)
06:48 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
06:47 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
06:46 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
06:45 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
06:45 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
06:44 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc6 T391454', diff saved to https://phabricator.wikimedia.org/P75003 and previous config saved to /var/cache/conftool/dbconfig/20250415-050307-marostegui.json
04:57 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2016.codfw.wmnet,pc1016.eqiad.wmnet with reason: Maintenance
04:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc6 T391454', diff saved to https://phabricator.wikimedia.org/P75002 and previous config saved to /var/cache/conftool/dbconfig/20250415-045700-marostegui.json
04:10 mwpresync@deploy1003: Pruned MediaWiki: 1.44.0-wmf.22 (duration: 10m 03s)
03:43 mwpresync@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.44.0-wmf.24,1.44.0-wmf.25 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discov
03:02 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.44.0-wmf.25 refs T386220
02:32 ejegg: payments-wiki upgraded from ef9284aa to ba6e8d65
02:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1181.eqiad.wmnet with OS bullseye
01:32 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1181.eqiad.wmnet with OS bullseye
01:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1181']
01:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1181']
01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
01:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL

2025-04-14

23:22 urandom: bootstrapping Cassandra/restbase1044-b — T389423
23:12 zabe: zabe@mwmaint1002:~$ cat group2.dblist | xargs -I{} bash -c "echo {}; mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php {} --delete /home/zabe/afl_text_table_deletedump/{} --sleep 0.3" # T381599
22:44 ladsgroup@dns1004: END - running authdns-update
22:42 ladsgroup@dns1004: START - running authdns-update
22:34 mutante: deploy1003 - scap install-world -l release2003.codfw.wmnet T391590
22:34 dzahn@deploy1003: Installation of scap version "4.153.0" completed for 1 hosts
22:33 dzahn@deploy1003: Installing scap version "4.153.0" for 1 host(s)
22:30 sbassett: Deployed previous good versions of affected files for T391343
22:25 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2239.codfw.wmnet with reason: Maintenance
22:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T391056)', diff saved to https://phabricator.wikimedia.org/P75001 and previous config saved to /var/cache/conftool/dbconfig/20250414-222519-fceratto.json
22:20 sbassett: Deployment of security patch for T391343 halted
22:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P75000 and previous config saved to /var/cache/conftool/dbconfig/20250414-221012-fceratto.json
22:06 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch2060.codfw.wmnet|cirrussearch2067.codfw.wmnet|cirrussearch2068.codfw.wmnet|cirrussearch2072.codfw.wmnet|cirrussearch2085.codfw.wmnet|cirrussearch2104.codfw.wmnet|cirrussearch2105.codfw.wmnet|cirrussearch2107.codfw.wmnet|cirrussearch2109.codfw.wmnet|cirrussearch2114.codfw.wmnet|cirrussearch2115.codfw.wmnet
21:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P74999 and previous config saved to /var/cache/conftool/dbconfig/20250414-215504-fceratto.json
21:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T391056)', diff saved to https://phabricator.wikimedia.org/P74998 and previous config saved to /var/cache/conftool/dbconfig/20250414-213957-fceratto.json
21:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2227 (T391056)', diff saved to https://phabricator.wikimedia.org/P74997 and previous config saved to /var/cache/conftool/dbconfig/20250414-212344-fceratto.json
21:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2227.codfw.wmnet with reason: Maintenance
21:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T391056)', diff saved to https://phabricator.wikimedia.org/P74996 and previous config saved to /var/cache/conftool/dbconfig/20250414-212320-fceratto.json
21:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P74995 and previous config saved to /var/cache/conftool/dbconfig/20250414-210814-fceratto.json
20:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P74994 and previous config saved to /var/cache/conftool/dbconfig/20250414-205307-fceratto.json
20:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T391056)', diff saved to https://phabricator.wikimedia.org/P74993 and previous config saved to /var/cache/conftool/dbconfig/20250414-203800-fceratto.json
20:23 jforrester@deploy1003: Finished scap sync-world: Backport for FunctionCalls: Use base64url encoding rather than raw base64 (T391584), FunctionCalls: Don't error if Wikifunctions.org isn't in client mode yet (T391584), FunctionCalls: Throw an explicable error if json_encode returns null (T391584) (duration: 14m 20s)
20:21 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T391056)', diff saved to https://phabricator.wikimedia.org/P74992 and previous config saved to /var/cache/conftool/dbconfig/20250414-202152-fceratto.json
20:21 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2205.codfw.wmnet with reason: Maintenance
20:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T391056)', diff saved to https://phabricator.wikimedia.org/P74991 and previous config saved to /var/cache/conftool/dbconfig/20250414-202131-fceratto.json
20:17 jforrester@deploy1003: jforrester: Continuing with sync
20:14 jforrester@deploy1003: jforrester: Backport for FunctionCalls: Use base64url encoding rather than raw base64 (T391584), FunctionCalls: Don't error if Wikifunctions.org isn't in client mode yet (T391584), FunctionCalls: Throw an explicable error if json_encode returns null (T391584) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:09 jforrester@deploy1003: Started scap sync-world: Backport for FunctionCalls: Use base64url encoding rather than raw base64 (T391584), FunctionCalls: Don't error if Wikifunctions.org isn't in client mode yet (T391584), FunctionCalls: Throw an explicable error if json_encode returns null (T391584)
20:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P74990 and previous config saved to /var/cache/conftool/dbconfig/20250414-200624-fceratto.json
20:02 mforns@deploy1003: Finished deploy [analytics/refinery@6fe5a7e] (thin): Regular analytics weekly train THIN [analytics/refinery@6fe5a7e3] (duration: 01m 09s)
20:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2109.codfw.wmnet with OS bullseye
20:01 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row D - bking@cumin2002 - T388610
20:01 mforns@deploy1003: Started deploy [analytics/refinery@6fe5a7e] (thin): Regular analytics weekly train THIN [analytics/refinery@6fe5a7e3]
20:00 mforns@deploy1003: Finished deploy [analytics/refinery@6fe5a7e]: Regular analytics weekly train [analytics/refinery@6fe5a7e3] (duration: 03m 31s)
19:57 mforns@deploy1003: Started deploy [analytics/refinery@6fe5a7e]: Regular analytics weekly train [analytics/refinery@6fe5a7e3]
19:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P74989 and previous config saved to /var/cache/conftool/dbconfig/20250414-195117-fceratto.json
19:50 mforns@deploy1003: Finished deploy [analytics/refinery@6fe5a7e] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6fe5a7e3] (duration: 02m 44s)
19:47 mforns@deploy1003: Started deploy [analytics/refinery@6fe5a7e] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6fe5a7e3]
19:40 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2109.codfw.wmnet with reason: host reimage
19:36 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2109.codfw.wmnet with reason: host reimage
19:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T391056)', diff saved to https://phabricator.wikimedia.org/P74988 and previous config saved to /var/cache/conftool/dbconfig/20250414-193610-fceratto.json
19:35 mforns@deploy1003: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
19:35 mforns@deploy1003: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
19:35 mforns@deploy1003: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
19:34 mforns@deploy1003: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
19:31 urandom: dropped & recreated 8 commons impact metrics tables — https://phabricator.wikimedia.org/T370470#10687053
19:24 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
19:24 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
19:24 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
19:23 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/data-gateway: apply
19:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2109
19:20 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2109
19:19 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T391056)', diff saved to https://phabricator.wikimedia.org/P74987 and previous config saved to /var/cache/conftool/dbconfig/20250414-191957-fceratto.json
19:19 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2194.codfw.wmnet with reason: Maintenance
19:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T391056)', diff saved to https://phabricator.wikimedia.org/P74986 and previous config saved to /var/cache/conftool/dbconfig/20250414-191933-fceratto.json
19:17 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2109
19:17 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2109.codfw.wmnet 160.48.192.10.in-addr.arpa 0.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
19:17 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2109.codfw.wmnet 160.48.192.10.in-addr.arpa 0.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
19:17 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:17 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2109 - bking@cumin2002"
19:17 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2109 - bking@cumin2002"
19:13 bking@cumin2002: START - Cookbook sre.dns.netbox
19:10 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2109
19:10 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2109.codfw.wmnet with OS bullseye
19:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2109 to cirrussearch2109
19:07 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2109
19:07 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2109
19:07 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:07 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2109 to cirrussearch2109 - bking@cumin2002"
19:07 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2109 to cirrussearch2109 - bking@cumin2002"
19:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P74985 and previous config saved to /var/cache/conftool/dbconfig/20250414-190426-fceratto.json
19:02 bking@cumin2002: START - Cookbook sre.dns.netbox
19:02 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2109 to cirrussearch2109
18:55 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row D - bking@cumin2002 - T388610
18:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P74984 and previous config saved to /var/cache/conftool/dbconfig/20250414-184918-fceratto.json
18:37 jforrester@deploy1003: Finished scap sync-world: Backport for Complete our RecentChanges entry generation and formatting (T386020), Switch test Wikifunctions client deployment from test2wiki to test2iki (T391584), Document Wikifunctions options, adding wgWikiLambdaClientModeOffline (T391584) (duration: 32m 25s)
18:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T391056)', diff saved to https://phabricator.wikimedia.org/P74983 and previous config saved to /var/cache/conftool/dbconfig/20250414-183411-fceratto.json
18:27 jforrester@deploy1003: jforrester: Continuing with sync
18:27 James_F: Run `mwscript sql --wiki=testwiki /srv/mediawiki-staging/php-1.44.0-wmf.24/extensions/WikiLambda/sql/mysql/table-usage.sql` for T391885
18:24 jforrester@deploy1003: jforrester: Backport for Complete our RecentChanges entry generation and formatting (T386020), Switch test Wikifunctions client deployment from test2wiki to test2iki (T391584), Document Wikifunctions options, adding wgWikiLambdaClientModeOffline (T391584) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T391056)', diff saved to https://phabricator.wikimedia.org/P74982 and previous config saved to /var/cache/conftool/dbconfig/20250414-181802-fceratto.json
18:17 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2190.codfw.wmnet with reason: Maintenance
18:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T391056)', diff saved to https://phabricator.wikimedia.org/P74981 and previous config saved to /var/cache/conftool/dbconfig/20250414-181740-fceratto.json
18:05 jforrester@deploy1003: Started scap sync-world: Backport for Complete our RecentChanges entry generation and formatting (T386020), Switch test Wikifunctions client deployment from test2wiki to test2iki (T391584), Document Wikifunctions options, adding wgWikiLambdaClientModeOffline (T391584)
18:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P74980 and previous config saved to /var/cache/conftool/dbconfig/20250414-180232-fceratto.json
17:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P74979 and previous config saved to /var/cache/conftool/dbconfig/20250414-174725-fceratto.json
17:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T391056)', diff saved to https://phabricator.wikimedia.org/P74978 and previous config saved to /var/cache/conftool/dbconfig/20250414-173218-fceratto.json
17:30 swfrench-wmf: running: cumin -b8 -s60 'A:cp-text' 'run-puppet-agent -e "merging ATS config change - T391421"'
17:26 hashar@deploy1003: Finished deploy [integration/docroot@e92740c]: opensource: remove OOjs Router - T358813 (duration: 00m 10s)
17:25 hashar@deploy1003: Started deploy [integration/docroot@e92740c]: opensource: remove OOjs Router - T358813
17:25 swfrench-wmf: running: run-puppet-agent -e "merging ATS config change - T391421" on cp4040
17:20 swfrench-wmf: running: cumin 'A:cp-text' 'disable-puppet "merging ATS config change - T391421"'
17:17 swfrench@deploy1003: Finished scap sync-world: Backport for Remove PHP 8.1 migration WikimediaEvents settings (T391421) (duration: 13m 10s)
17:16 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T391056)', diff saved to https://phabricator.wikimedia.org/P74977 and previous config saved to /var/cache/conftool/dbconfig/20250414-171622-fceratto.json
17:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2177.codfw.wmnet with reason: Maintenance
17:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T391056)', diff saved to https://phabricator.wikimedia.org/P74976 and previous config saved to /var/cache/conftool/dbconfig/20250414-171558-fceratto.json
17:10 swfrench@deploy1003: swfrench: Continuing with sync
17:10 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1181.eqiad.wmnet with OS bullseye
17:08 swfrench@deploy1003: swfrench: Backport for Remove PHP 8.1 migration WikimediaEvents settings (T391421) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:04 swfrench@deploy1003: Started scap sync-world: Backport for Remove PHP 8.1 migration WikimediaEvents settings (T391421)
17:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P74975 and previous config saved to /var/cache/conftool/dbconfig/20250414-170052-fceratto.json
16:56 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_magru
16:56 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_magru
16:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P74974 and previous config saved to /var/cache/conftool/dbconfig/20250414-164545-fceratto.json
16:38 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1181.eqiad.wmnet with OS bullseye
16:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T391056)', diff saved to https://phabricator.wikimedia.org/P74973 and previous config saved to /var/cache/conftool/dbconfig/20250414-163037-fceratto.json
16:21 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1181.eqiad.wmnet with OS bullseye
16:15 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T391056)', diff saved to https://phabricator.wikimedia.org/P74972 and previous config saved to /var/cache/conftool/dbconfig/20250414-161512-fceratto.json
16:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2186.codfw.wmnet with reason: Maintenance
16:14 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2156.codfw.wmnet with reason: Maintenance
16:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T391056)', diff saved to https://phabricator.wikimedia.org/P74971 and previous config saved to /var/cache/conftool/dbconfig/20250414-161432-fceratto.json
16:06 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1181.eqiad.wmnet with OS bullseye
16:05 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
16:03 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_ulsfo and not P{cp4037.ulsfo.wmnet} and A:cp
15:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P74970 and previous config saved to /var/cache/conftool/dbconfig/20250414-155925-fceratto.json
15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
15:57 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1181.eqiad.wmnet with OS bullseye
15:56 fceratto@dns1004: END - running authdns-update
15:53 fceratto@dns1004: START - running authdns-update
15:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P74969 and previous config saved to /var/cache/conftool/dbconfig/20250414-154419-fceratto.json
15:44 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:43 fceratto@cumin1002: START - Cookbook sre.dns.netbox
15:40 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:37 urandom: bootstrapping Cassandra/restbase1044-a — T389423
15:37 fceratto@cumin1002: START - Cookbook sre.dns.netbox
15:33 eevans@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1044.eqiad.wmnet with reason: Bootstrapping — T389423
15:30 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_ulsfo and not P{cp4047.ulsfo.wmnet} and not P{cp4045.ulsfo.wmnet} and A:cp
15:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T391056)', diff saved to https://phabricator.wikimedia.org/P74968 and previous config saved to /var/cache/conftool/dbconfig/20250414-152911-fceratto.json
15:26 volans: deployed homer v0.9.0 to cumin hosts
15:25 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1181.eqiad.wmnet with OS bullseye
15:25 volans@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.9.0 - volans@cumin1002
15:24 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
15:23 volans@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.9.0 - volans@cumin1002
15:15 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
15:13 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T391056)', diff saved to https://phabricator.wikimedia.org/P74967 and previous config saved to /var/cache/conftool/dbconfig/20250414-151316-fceratto.json
15:13 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: Maintenance
15:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
15:02 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1240.eqiad.wmnet with reason: Maintenance
15:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T391056)', diff saved to https://phabricator.wikimedia.org/P74966 and previous config saved to /var/cache/conftool/dbconfig/20250414-150200-fceratto.json
14:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P74965 and previous config saved to /var/cache/conftool/dbconfig/20250414-144653-fceratto.json
14:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P74964 and previous config saved to /var/cache/conftool/dbconfig/20250414-143146-fceratto.json
14:26 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2104.codfw.wmnet with OS bullseye
14:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T391056)', diff saved to https://phabricator.wikimedia.org/P74963 and previous config saved to /var/cache/conftool/dbconfig/20250414-141639-fceratto.json
14:12 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T391056)', diff saved to https://phabricator.wikimedia.org/P74962 and previous config saved to /var/cache/conftool/dbconfig/20250414-141227-fceratto.json
14:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
14:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: Maintenance
14:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T391056)', diff saved to https://phabricator.wikimedia.org/P74961 and previous config saved to /var/cache/conftool/dbconfig/20250414-141148-fceratto.json
14:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2104.codfw.wmnet with reason: host reimage
14:01 godog: temp disable "backend time" panel using unaggregated big mediawiki metric on "reading web performance" dashboard - T391677
14:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2104.codfw.wmnet with reason: host reimage
13:57 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1178.eqiad.wmnet with OS bullseye
13:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P74960 and previous config saved to /var/cache/conftool/dbconfig/20250414-135640-fceratto.json
13:47 arnaudb@cumin1002: END (ERROR) - Cookbook sre.gerrit.failover (exit_code=97) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
13:47 arnaudb@cumin1002: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
13:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P74956 and previous config saved to /var/cache/conftool/dbconfig/20250414-134132-fceratto.json
13:41 TheresNoTime: UTC afternoon backport window done
13:40 samtar@deploy1003: Finished scap sync-world: Backport for Enable SUL3 on most remaining beta cluster wikis, punjabiwikimedia, maiwikimedia: fix tagline (T348611) (duration: 12m 00s)
13:38 sukhe: reprepro -C component/nginx-ech include bookworm-wikimedia nginx_1.22.1-9+deb12u1+ech2_amd64.changes: T205378
13:33 samtar@deploy1003: matmarex, anzx, samtar: Continuing with sync
13:33 samtar@deploy1003: matmarex, anzx, samtar: Backport for Enable SUL3 on most remaining beta cluster wikis, punjabiwikimedia, maiwikimedia: fix tagline (T348611) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2104
13:30 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2104
13:30 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2104.codfw.wmnet with OS bullseye
13:28 samtar@deploy1003: Started scap sync-world: Backport for Enable SUL3 on most remaining beta cluster wikis, punjabiwikimedia, maiwikimedia: fix tagline (T348611)
13:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from cirrussearch2014 to cirrussearch2104
13:27 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2104
13:27 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2104
13:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:27 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming cirrussearch2014 to cirrussearch2104 - bking@cumin2002"
13:26 samtar@deploy1003: Finished scap sync-world: Backport for CentralAuthTokenManager: Log failures for write operations (T390784) (duration: 11m 39s)
13:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T391056)', diff saved to https://phabricator.wikimedia.org/P74955 and previous config saved to /var/cache/conftool/dbconfig/20250414-132625-fceratto.json
13:23 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Compatibility with conftool 5.1.0 (take 2) - oblivian@cumin2002"
13:23 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Compatibility with conftool 5.1.0 (take 2) - oblivian@cumin2002
13:22 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Compatibility with conftool 5.1.0 (take 2) - oblivian@cumin2002
13:22 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Compatibility with conftool 5.1.0 (take 2) - oblivian@cumin2002"
13:22 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming cirrussearch2014 to cirrussearch2104 - bking@cumin2002"
13:22 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T391056)', diff saved to https://phabricator.wikimedia.org/P74954 and previous config saved to /var/cache/conftool/dbconfig/20250414-132232-fceratto.json
13:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1198.eqiad.wmnet with reason: Maintenance
13:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T391056)', diff saved to https://phabricator.wikimedia.org/P74953 and previous config saved to /var/cache/conftool/dbconfig/20250414-132210-fceratto.json
13:22 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Compatibility with conftool 5.1.0 - oblivian@cumin2002"
13:22 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Compatibility with conftool 5.1.0 - oblivian@cumin2002
13:21 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Compatibility with conftool 5.1.0 - oblivian@cumin2002
13:21 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Compatibility with conftool 5.1.0 - oblivian@cumin2002"
13:19 samtar@deploy1003: samtar, matmarex: Continuing with sync
13:19 samtar@deploy1003: samtar, matmarex: Backport for CentralAuthTokenManager: Log failures for write operations (T390784) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:18 bking@cumin2002: START - Cookbook sre.dns.netbox
13:18 bking@cumin2002: START - Cookbook sre.hosts.rename from cirrussearch2014 to cirrussearch2104
13:17 vgutierrez: rolling upgrade to varnish 7.1.1-1.1~bpo11+wmf3 in magru - T391334
13:17 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_magru
13:16 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_magru
13:15 _joe_: installed updates to conftool on cumin hosts
13:14 samtar@deploy1003: Started scap sync-world: Backport for CentralAuthTokenManager: Log failures for write operations (T390784)
13:13 elukey@deploy1003: Finished deploy [docker-pkg/deploy@a555b7b]: Upgrade to 4.0.4 (duration: 00m 38s)
13:13 elukey@deploy1003: Started deploy [docker-pkg/deploy@a555b7b]: Upgrade to 4.0.4
13:13 godog: remove old LVs from prometheus[12]00[56] - T383232
13:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P74952 and previous config saved to /var/cache/conftool/dbconfig/20250414-130703-fceratto.json
13:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc5 T391454', diff saved to https://phabricator.wikimedia.org/P74951 and previous config saved to /var/cache/conftool/dbconfig/20250414-130222-marostegui.json
13:01 moritzm: remove ganeti01.svc.eqiad.wmnet cert (replaced by cfssl cert) T357750
12:56 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_ulsfo and not P{cp4037.ulsfo.wmnet} and A:cp
12:56 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2015.codfw.wmnet,pc1015.eqiad.wmnet with reason: Maintenance
{{safesubst:SAL entry|1=12:56 jforrester@deploy1003: Finished scap sync-world: Backport for Special pages: Don't just set userCanExecute() but actually run it (T391594), Client mode: Provide WikiLambdaClientModeOffline for SRE to disable, Wikifunctions VE: Add loading and abort state to content editable (T391441), [[gerrit:1136126|logging: Allow through WikiLambdaClient logs at info level an}}
12:56 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_ulsfo and not P{cp4047.ulsfo.wmnet} and not P{cp4045.ulsfo.wmnet} and A:cp
12:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc5 T391454', diff saved to https://phabricator.wikimedia.org/P74950 and previous config saved to /var/cache/conftool/dbconfig/20250414-125511-marostegui.json
12:53 moritzm: remove ganeti01.svc.codfw.wmnet cert (replaced by cfssl cert) T357750
12:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P74949 and previous config saved to /var/cache/conftool/dbconfig/20250414-125156-fceratto.json
12:51 godog: upgrade prometheus2007 to thanos 0.38.0 - T383966
12:50 godog: upgrade prometheus2005 to thanos 0.38.0 - T383966
12:49 moritzm: remove ganeti01.svc.esams.wmnet cert (replaced by cfssl cert) T357750
12:46 jforrester@deploy1003: jforrester: Continuing with sync
12:46 moritzm: remove ganeti01.svc.ulsfo.wmnet cert (replaced by cfssl cert) T357750
12:44 jforrester@deploy1003: jforrester: Backport for Special pages: Don't just set userCanExecute() but actually run it (T391594), Client mode: Provide WikiLambdaClientModeOffline for SRE to disable, Wikifunctions VE: Add loading and abort state to content editable (T391441), logging: Allow through WikiLambdaClient logs at info level and above sync
12:43 moritzm: remove ganeti01.svc.eqsin.wmnet cert (replaced by cfssl cert) T357750
{{safesubst:SAL entry|1=12:36 jforrester@deploy1003: Started scap sync-world: Backport for Special pages: Don't just set userCanExecute() but actually run it (T391594), Client mode: Provide WikiLambdaClientModeOffline for SRE to disable, Wikifunctions VE: Add loading and abort state to content editable (T391441), [[gerrit:1136126|logging: Allow through WikiLambdaClient logs at info level and}}
12:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T391056)', diff saved to https://phabricator.wikimedia.org/P74948 and previous config saved to /var/cache/conftool/dbconfig/20250414-123649-fceratto.json
12:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T391056)', diff saved to https://phabricator.wikimedia.org/P74947 and previous config saved to /var/cache/conftool/dbconfig/20250414-123255-fceratto.json
12:32 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: Maintenance
12:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T391056)', diff saved to https://phabricator.wikimedia.org/P74946 and previous config saved to /var/cache/conftool/dbconfig/20250414-123234-fceratto.json
12:25 cgoubert@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
12:24 cgoubert@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
12:24 cgoubert@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
12:23 cgoubert@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
12:22 cgoubert@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
12:22 cgoubert@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
12:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P74945 and previous config saved to /var/cache/conftool/dbconfig/20250414-121726-fceratto.json
{{safesubst:SAL entry|1=12:06 jforrester@deploy1003: Started scap sync-world: Backport for Special pages: Don't just set userCanExecute() but actually run it (T391594), Client mode: Provide WikiLambdaClientModeOffline for SRE to disable, Wikifunctions VE: Add loading and abort state to content editable (T391441), [[gerrit:1136126|logging: Allow through WikiLambdaClient logs at info level and}}
12:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P74944 and previous config saved to /var/cache/conftool/dbconfig/20250414-120219-fceratto.json
11:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T391056)', diff saved to https://phabricator.wikimedia.org/P74943 and previous config saved to /var/cache/conftool/dbconfig/20250414-114711-fceratto.json
11:43 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T391056)', diff saved to https://phabricator.wikimedia.org/P74942 and previous config saved to /var/cache/conftool/dbconfig/20250414-114323-fceratto.json
11:43 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1175.eqiad.wmnet with reason: Maintenance
11:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T391056)', diff saved to https://phabricator.wikimedia.org/P74941 and previous config saved to /var/cache/conftool/dbconfig/20250414-114300-fceratto.json
11:40 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox
11:30 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp4045.ulsfo.wmnet} and A:cp
11:30 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp4037.ulsfo.wmnet} and A:cp
11:29 fceratto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
11:29 fceratto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
11:28 fceratto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
11:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P74940 and previous config saved to /var/cache/conftool/dbconfig/20250414-112754-fceratto.json
11:27 fceratto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
11:26 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
11:26 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
11:25 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp4045.ulsfo.wmnet} and A:cp
11:25 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
11:25 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp4037.ulsfo.wmnet} and A:cp
11:25 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
11:24 vgutierrez: upload varnishkafka 1.2.0-3 to apt.wm.o (bullseye-wikimedia) - T391334
11:20 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
11:20 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
11:19 fceratto@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:19 fceratto@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P74939 and previous config saved to /var/cache/conftool/dbconfig/20250414-111247-fceratto.json
11:12 moritzm: restart spamassassin on lists* to pick up Perl security updates
10:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 90% (T360589), CommonSettings: remove outdated SecurePoll comment (T209892) (duration: 17m 26s)
10:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T391056)', diff saved to https://phabricator.wikimedia.org/P74938 and previous config saved to /var/cache/conftool/dbconfig/20250414-105741-fceratto.json
10:57 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
10:53 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T391056)', diff saved to https://phabricator.wikimedia.org/P74937 and previous config saved to /var/cache/conftool/dbconfig/20250414-105351-fceratto.json
10:53 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1166.eqiad.wmnet with reason: Maintenance
10:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T391056)', diff saved to https://phabricator.wikimedia.org/P74936 and previous config saved to /var/cache/conftool/dbconfig/20250414-105329-fceratto.json
10:53 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=1) rolling upgrade of Varnish on A:cp-text_ulsfo
10:52 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=1) rolling upgrade of Varnish on A:cp-upload_ulsfo and not P{cp4047.ulsfo.wmnet} and A:cp
10:49 ladsgroup@deploy1003: ladsgroup, novemlinguae: Continuing with sync
10:48 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_ulsfo and not P{cp4047.ulsfo.wmnet} and A:cp
10:48 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_ulsfo
10:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74935 and previous config saved to /var/cache/conftool/dbconfig/20250414-104758-root.json
10:47 ladsgroup@deploy1003: ladsgroup, novemlinguae: Backport for Bump thumbnail steps to 90% (T360589), CommonSettings: remove outdated SecurePoll comment (T209892) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:44 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
10:43 vgutierrez: rolling upgrade to varnish 7.1.1-1..1~bpo11+wmf3 in ulsfo - T391334
10:42 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox
10:41 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 90% (T360589), CommonSettings: remove outdated SecurePoll comment (T209892)
10:40 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
10:40 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
10:40 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
10:40 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
10:35 vgutierrez: upload varnish 7.1.1-1.1~bpo11+wmf3 to apt.wm.o (bullseye-wikimedia) - T391334
10:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74933 and previous config saved to /var/cache/conftool/dbconfig/20250414-103253-root.json
10:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P74932 and previous config saved to /var/cache/conftool/dbconfig/20250414-102316-fceratto.json
10:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1178 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P74931 and previous config saved to /var/cache/conftool/dbconfig/20250414-101748-root.json
10:15 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 90% (T360589), CommonSettings: remove outdated SecurePoll comment (T209892)
10:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T391056)', diff saved to https://phabricator.wikimedia.org/P74930 and previous config saved to /var/cache/conftool/dbconfig/20250414-100809-fceratto.json
10:04 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1157 (T391056)', diff saved to https://phabricator.wikimedia.org/P74929 and previous config saved to /var/cache/conftool/dbconfig/20250414-100412-fceratto.json
10:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1157.eqiad.wmnet with reason: Maintenance
10:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74928 and previous config saved to /var/cache/conftool/dbconfig/20250414-100242-root.json
10:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc1', diff saved to https://phabricator.wikimedia.org/P74927 and previous config saved to /var/cache/conftool/dbconfig/20250414-100135-marostegui.json
10:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc1', diff saved to https://phabricator.wikimedia.org/P74925 and previous config saved to /var/cache/conftool/dbconfig/20250414-100038-marostegui.json
09:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance
09:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1178 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P74924 and previous config saved to /var/cache/conftool/dbconfig/20250414-094737-root.json
09:35 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2220 gradually with 4 steps - Finished upgrading host
09:33 vgutierrez: restarting acme-chief API servers to catch up on liblzma updates
09:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1178 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P74922 and previous config saved to /var/cache/conftool/dbconfig/20250414-093232-root.json
09:31 vgutierrez: restarting acme-chief to catch up on liblzma updates
09:21 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2230.codfw.wmnet
09:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74919 and previous config saved to /var/cache/conftool/dbconfig/20250414-091727-root.json
09:15 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2230.codfw.wmnet
09:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1178 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P74917 and previous config saved to /var/cache/conftool/dbconfig/20250414-090222-root.json
09:00 XioNoX: gnmic: bump `num-workers` to 16 on netflow1002 - T388641
08:48 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2220 gradually with 4 steps - Finished upgrading host
08:47 moritzm: installing Postgres 15 security updates
08:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74914 and previous config saved to /var/cache/conftool/dbconfig/20250414-084716-root.json
08:46 fabfur: enable-puppet on A:cp (T391670)
08:45 moritzm: restart Postfix/Dovecot on outbound MXes to pick up xz security updates
08:41 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1178.eqiad.wmnet with OS bullseye
08:40 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1178.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
08:39 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
08:39 moritzm: restarting ircstream on irc1003, clients will reconnect automatically
08:39 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db2220.codfw.wmnet
08:36 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
08:35 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp1111.eqiad.wmnet
08:34 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1178.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
08:32 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2220.codfw.wmnet
08:31 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp1111.eqiad.wmnet
08:31 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2220 - Upgrading host
08:30 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2220 - Upgrading host
08:27 fabfur: disable-puppet on A:cp to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/1135827 (T391670)
08:26 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1178.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
08:25 vriley@cumin1002: START - Cookbook sre.hosts.provision for host db1178.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
08:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1178', diff saved to https://phabricator.wikimedia.org/P74912 and previous config saved to /var/cache/conftool/dbconfig/20250414-082235-marostegui.json
08:20 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1178.eqiad.wmnet with OS bullseye
08:11 moritzm: restarting clamav on vrts to pick up liblzma security updates
07:58 moritzm: rebalance ganeti/B T391243
07:53 XioNoX: gnmic: bump `num-workers` to 12 on netflow1002 - T388641
07:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet
07:42 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet
07:39 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: sync
07:37 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: sync
07:37 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: sync
07:36 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/proton: sync
07:27 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: sync
07:27 elukey@deploy1003: helmfile [staging] START helmfile.d/services/proton: sync
07:26 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host db1178.eqiad.wmnet with OS bullseye
07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc4 T391454', diff saved to https://phabricator.wikimedia.org/P74911 and previous config saved to /var/cache/conftool/dbconfig/20250414-072437-marostegui.json
07:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc4 T391454', diff saved to https://phabricator.wikimedia.org/P74910 and previous config saved to /var/cache/conftool/dbconfig/20250414-071653-marostegui.json
07:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2014.codfw.wmnet,pc1014.eqiad.wmnet with reason: Maintenance
07:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1044.eqiad.wmnet
07:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1044.eqiad.wmnet
07:04 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
07:04 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc3 T391454', diff saved to https://phabricator.wikimedia.org/P74909 and previous config saved to /var/cache/conftool/dbconfig/20250414-070220-marostegui.json
07:01 moritzm: installing subversion security updates
06:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet
06:55 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2013.codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance
06:54 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet
06:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc3 T391454', diff saved to https://phabricator.wikimedia.org/P74908 and previous config saved to /var/cache/conftool/dbconfig/20250414-065203-marostegui.json
06:51 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
06:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1043.eqiad.wmnet
06:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1043.eqiad.wmnet
06:41 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
06:15 moritzm: installing perl security updates
06:12 _joe_: uploaded conftool 5.1.0

2025-04-12

19:16 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
19:12 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
16:19 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
16:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
16:08 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
16:06 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
16:06 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
16:04 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
16:04 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
16:00 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply

2025-04-11

23:02 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
23:01 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
22:55 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
22:55 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
21:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2014.codfw.wmnet with OS bullseye
21:40 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2014.codfw.wmnet with reason: host reimage
21:37 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2014.codfw.wmnet with reason: host reimage
20:58 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from cirrussearch2014 to cirrussearch2104
20:58 bking@cumin2002: START - Cookbook sre.hosts.rename from cirrussearch2014 to cirrussearch2104
20:57 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2105.codfw.wmnet with OS bullseye
20:41 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2014
20:41 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2014
20:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2105.codfw.wmnet with reason: host reimage
20:35 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2014
20:35 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2014.codfw.wmnet 69.48.192.10.in-addr.arpa 9.6.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
20:35 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2014.codfw.wmnet 69.48.192.10.in-addr.arpa 9.6.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
20:35 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:35 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2014 - bking@cumin2002"
20:35 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2014 - bking@cumin2002"
20:32 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2105.codfw.wmnet with reason: host reimage
20:27 bking@cumin2002: START - Cookbook sre.dns.netbox
20:26 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2014
20:25 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2014.codfw.wmnet with OS bullseye
20:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2105
20:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2105
20:14 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2105
20:14 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2105.codfw.wmnet 70.48.192.10.in-addr.arpa 0.7.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
20:14 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2105.codfw.wmnet 70.48.192.10.in-addr.arpa 0.7.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
20:14 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:14 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2105 - ryankemper@cumin2002"
20:14 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2105 - ryankemper@cumin2002"
20:13 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2104.codfw.wmnet on all recursors
20:13 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2104.codfw.wmnet on all recursors
20:11 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "fix typo (cirrussearch2014 should be cirrussearch2104) - bking@cumin2002 - T388610"
20:11 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "fix typo (cirrussearch2014 should be cirrussearch2104) - bking@cumin2002 - T388610"
20:06 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
20:06 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2105
20:06 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2105.codfw.wmnet with OS bullseye
19:58 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2105 to cirrussearch2105
19:57 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2105
19:57 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2105
19:57 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:57 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2105 to cirrussearch2105 - ryankemper@cumin2002"
19:57 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2105 to cirrussearch2105 - ryankemper@cumin2002"
19:52 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2014.codfw.wmnet on all recursors
19:52 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2014.codfw.wmnet on all recursors
19:49 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2104.codfw.wmnet on all recursors
19:49 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2104.codfw.wmnet on all recursors
19:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2104 to cirrussearch2014
19:48 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2014
19:48 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2014
19:48 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:48 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2104 to cirrussearch2014 - bking@cumin2002"
19:45 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2104 to cirrussearch2014 - bking@cumin2002"
19:45 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
19:44 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic2105 to cirrussearch2105
19:40 bking@cumin2002: START - Cookbook sre.dns.netbox
19:39 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2104 to cirrussearch2014
18:45 topranks: remove et-0/0/0 from ae0 LAG bundle on cr3-ulsfo and cr4-ulsfo T390731
18:41 cmooney@dns2005: END - running authdns-update
18:41 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:41 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns records for new separate routed link in ulsfo - cmooney@cumin1002"
18:39 cmooney@dns2005: START - running authdns-update
18:35 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns records for new separate routed link in ulsfo - cmooney@cumin1002"
18:32 cmooney@cumin1002: START - Cookbook sre.dns.netbox
17:53 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:53 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add test server IP dns nokia lab - cmooney@cumin1002"
17:53 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add test server IP dns nokia lab - cmooney@cumin1002"
17:47 cmooney@cumin1002: START - Cookbook sre.dns.netbox
17:39 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2085.codfw.wmnet with OS bullseye
17:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1007.eqiad.wmnet with OS bullseye
17:37 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:37 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1007.eqiad.wmnet with reason: host reimage
17:19 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1007.eqiad.wmnet with reason: host reimage
17:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2085.codfw.wmnet with reason: host reimage
17:15 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2085.codfw.wmnet with reason: host reimage
17:08 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-druid1007.eqiad.wmnet with OS bullseye
17:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
17:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2085
17:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2085
16:59 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2085
16:59 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2085.codfw.wmnet 72.48.192.10.in-addr.arpa 2.7.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:59 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2085.codfw.wmnet 72.48.192.10.in-addr.arpa 2.7.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:59 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:59 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2085 - bking@cumin2002"
16:59 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2085 - bking@cumin2002"
16:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:54 bking@cumin2002: START - Cookbook sre.dns.netbox
16:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:49 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:48 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-druid1007
16:48 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-druid1007
16:47 bking@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
16:45 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-druid1007
16:45 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-druid1007
16:44 bking@cumin2002: START - Cookbook sre.dns.netbox
16:44 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2085
16:44 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2085.codfw.wmnet with OS bullseye
16:42 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cirrussearch2085.codfw.wmnet with OS bullseye
16:42 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2085.codfw.wmnet with OS bullseye
16:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:33 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2085.codfw.wmnet with OS bullseye
16:33 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.move-vlan (exit_code=99) for host cirrussearch2085
16:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:33 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2085
16:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2085.codfw.wmnet with OS bullseye
16:32 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2085.codfw.wmnet on all recursors
16:32 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2085.codfw.wmnet on all recursors
16:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2085 to cirrussearch2085
16:28 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2085
16:27 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2085
16:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:27 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2085 to cirrussearch2085 - bking@cumin2002"
16:27 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on 15 hosts with reason: reimaging/migrating hosts
16:26 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2085 to cirrussearch2085 - bking@cumin2002"
16:22 bking@cumin2002: START - Cookbook sre.dns.netbox
16:21 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2085 to cirrussearch2085
16:11 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-druid1007
16:11 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-druid1007
16:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1013.eqiad.wmnet with OS bullseye
16:09 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1012.eqiad.wmnet with OS bullseye
16:08 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:01 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:52 cmooney@cumin1002: START - Cookbook sre.dns.netbox
15:48 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:48 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add test server IP dns nokia lab - cmooney@cumin1002"
15:47 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add test server IP dns nokia lab - cmooney@cumin1002"
15:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1013.eqiad.wmnet with reason: host reimage
15:43 cmooney@cumin1002: START - Cookbook sre.dns.netbox
15:42 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1013.eqiad.wmnet with reason: host reimage
15:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1012.eqiad.wmnet with reason: host reimage
15:38 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1012.eqiad.wmnet with reason: host reimage
15:37 sukhe: reprepro -C component/nginx-ech include bookworm-wikimedia nginx_1.22.1-9+deb12u1+ech1_amd64.changes: T205378
15:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host druid1013.eqiad.wmnet with OS bullseye
15:30 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host druid1012.eqiad.wmnet with OS bullseye
15:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-druid1007.eqiad.wmnet with OS bullseye
15:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2114.codfw.wmnet with OS bullseye
15:23 sukhe: reprepro -C component/nginx-ech include bookworm-wikimedia openssl_3.4.1-1+ech3_amd64.changes: T205378
15:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2142.codfw.wmnet
15:23 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2142.codfw.wmnet
15:22 jclark@cumin1002: START - Cookbook sre.hosts.provision for host druid1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1006.eqiad.wmnet with OS bullseye
15:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:22 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host wikikube-worker2142.codfw.wmnet
15:19 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-worker2142.codfw.wmnet
15:19 claime: homer lsw1-c2-codfw* commit T391341
15:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host druid1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:12 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:08 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1006.eqiad.wmnet with reason: host reimage
15:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:05 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:05 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1006.eqiad.wmnet with reason: host reimage
15:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2114.codfw.wmnet with reason: host reimage
15:02 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
14:59 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2114.codfw.wmnet with reason: host reimage
14:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-druid1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2142']
14:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142']
14:53 sukhe: reprepro -C component/nginx-ech remove bookworm-wikimedia libssl3t64: removing libssl3t* since we dropped support for 64-bit time
14:52 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-druid1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:49 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on releases2003.codfw.wmnet with reason: Bookworm Re-image
14:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2114
14:43 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2114
14:43 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2114.codfw.wmnet with OS bullseye
14:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-druid1007.eqiad.wmnet with OS bullseye
14:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-druid1006.eqiad.wmnet with OS bullseye
13:33 sukhe: reprepro -C component/nginx-ech include bookworm-wikimedia openssl_3.4.1-1+ech2_amd64.changes: T205378
13:25 marostegui@cumin1002: dbctl commit (dc=all): 'Change weight for db1180 T390510', diff saved to https://phabricator.wikimedia.org/P74901 and previous config saved to /var/cache/conftool/dbconfig/20250411-132518-marostegui.json
12:10 godog: bounce thanos-query thanos-query-frontend thanos-store on titan1*
11:36 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1169.eqiad.wmnet
11:29 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1169.eqiad.wmnet
10:29 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1169.eqiad.wmnet
10:22 btullis@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1169.eqiad.wmnet
10:09 slyngshede@dns1004: END - running authdns-update
10:07 slyngshede@dns1004: START - running authdns-update
10:07 slyngshede@dns1004: START - running authdns-update
09:56 slyngshede@dns1004: END - running authdns-update
09:53 slyngshede@dns1004: START - running authdns-update
09:53 slyngshede@dns1004: END - running authdns-update
09:51 slyngshede@dns1004: START - running authdns-update
09:46 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1169.eqiad.wmnet with OS bullseye
09:24 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1169.eqiad.wmnet with reason: host reimage
09:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
09:20 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1169.eqiad.wmnet with reason: host reimage
09:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
09:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:05 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1169.eqiad.wmnet with OS bullseye
09:05 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1169.eqiad.wmnet with OS bullseye
08:44 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1169.eqiad.wmnet with OS bullseye
08:44 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1169.eqiad.wmnet with OS bullseye
07:50 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
07:46 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply

2025-04-10

22:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T391056)', diff saved to https://phabricator.wikimedia.org/P74899 and previous config saved to /var/cache/conftool/dbconfig/20250410-223055-fceratto.json
22:20 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch2055.codfw.wmnet|cirrussearch2056.codfw.wmnet|cirrussearch2062.codfw.wmnet|cirrussearch2068.codfw.wmnet|cirrussearch2069.codfw.wmnet|cirrussearch2074.codfw.wmnet|cirrussearch2075.codfw.wmnet|cirrussearch2087.codfw.wmnet|cirrussearch2088.codfw.wmnet|cirrussearch2089.codfw.wmnet|cirrussearch2090.codfw.wmnet|cirrussearch2091.codf
22:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P74898 and previous config saved to /var/cache/conftool/dbconfig/20250410-221548-fceratto.json
22:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P74897 and previous config saved to /var/cache/conftool/dbconfig/20250410-220040-fceratto.json
21:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T391056)', diff saved to https://phabricator.wikimedia.org/P74896 and previous config saved to /var/cache/conftool/dbconfig/20250410-214533-fceratto.json
21:42 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2228 (T391056)', diff saved to https://phabricator.wikimedia.org/P74894 and previous config saved to /var/cache/conftool/dbconfig/20250410-214205-fceratto.json
21:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2186.codfw.wmnet with reason: Maintenance
21:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2228.codfw.wmnet with reason: Maintenance
21:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T391056)', diff saved to https://phabricator.wikimedia.org/P74893 and previous config saved to /var/cache/conftool/dbconfig/20250410-214128-fceratto.json
21:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P74892 and previous config saved to /var/cache/conftool/dbconfig/20250410-212621-fceratto.json
21:16 bking@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2091.codfw.wmnet with OS bullseye
21:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P74891 and previous config saved to /var/cache/conftool/dbconfig/20250410-211114-fceratto.json
20:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T391056)', diff saved to https://phabricator.wikimedia.org/P74890 and previous config saved to /var/cache/conftool/dbconfig/20250410-205606-fceratto.json
20:52 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2223 (T391056)', diff saved to https://phabricator.wikimedia.org/P74889 and previous config saved to /var/cache/conftool/dbconfig/20250410-205211-fceratto.json
20:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2223.codfw.wmnet with reason: Maintenance
20:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T391056)', diff saved to https://phabricator.wikimedia.org/P74888 and previous config saved to /var/cache/conftool/dbconfig/20250410-205148-fceratto.json
20:41 bking@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2091
20:41 bking@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2091
20:40 bking@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2091
20:40 bking@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2091.codfw.wmnet 99.0.192.10.in-addr.arpa 9.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
20:40 bking@cumin1002: START - Cookbook sre.dns.wipe-cache cirrussearch2091.codfw.wmnet 99.0.192.10.in-addr.arpa 9.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
20:40 bking@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:40 bking@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2091 - bking@cumin1002"
20:40 bking@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2091 - bking@cumin1002"
20:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P74887 and previous config saved to /var/cache/conftool/dbconfig/20250410-203640-fceratto.json
20:34 bking@cumin1002: START - Cookbook sre.dns.netbox
20:34 bking@cumin1002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2091
20:34 bking@cumin1002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
20:30 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2091.codfw.wmnet with OS bullseye
20:30 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
20:28 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2091.codfw.wmnet with OS bullseye
20:27 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
20:26 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2091.codfw.wmnet with OS bullseye
20:26 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
20:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2091 to cirrussearch2091
20:24 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2091
20:24 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2091
20:24 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:24 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2091 to cirrussearch2091 - bking@cumin2002"
20:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P74886 and previous config saved to /var/cache/conftool/dbconfig/20250410-202132-fceratto.json
20:17 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2091 to cirrussearch2091 - bking@cumin2002"
20:17 cdobbins@dns1004: END - running authdns-update
20:15 cdobbins@dns1004: START - running authdns-update
20:13 bking@cumin2002: START - Cookbook sre.dns.netbox
20:13 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2091 to cirrussearch2091
20:09 cdobbins@dns1004: END - running authdns-update
20:07 cdobbins@dns1004: START - running authdns-update
20:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T391056)', diff saved to https://phabricator.wikimedia.org/P74885 and previous config saved to /var/cache/conftool/dbconfig/20250410-200625-fceratto.json
20:02 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T391056)', diff saved to https://phabricator.wikimedia.org/P74884 and previous config saved to /var/cache/conftool/dbconfig/20250410-200233-fceratto.json
20:02 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2211.codfw.wmnet with reason: Maintenance
20:00 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2201.codfw.wmnet with reason: Maintenance
20:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T391056)', diff saved to https://phabricator.wikimedia.org/P74883 and previous config saved to /var/cache/conftool/dbconfig/20250410-200022-fceratto.json
19:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P74882 and previous config saved to /var/cache/conftool/dbconfig/20250410-194514-fceratto.json
19:44 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2111.codfw.wmnet with OS bullseye
19:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P74881 and previous config saved to /var/cache/conftool/dbconfig/20250410-193007-fceratto.json
19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2111.codfw.wmnet with reason: host reimage
19:22 tzatziki: removing 2 files for legal compliance
19:20 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2111.codfw.wmnet with reason: host reimage
19:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T391056)', diff saved to https://phabricator.wikimedia.org/P74880 and previous config saved to /var/cache/conftool/dbconfig/20250410-191459-fceratto.json
19:13 tzatziki: removing 1 file for legal compliance
19:12 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T391056)', diff saved to https://phabricator.wikimedia.org/P74879 and previous config saved to /var/cache/conftool/dbconfig/20250410-191226-fceratto.json
19:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2192.codfw.wmnet with reason: Maintenance
19:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T391056)', diff saved to https://phabricator.wikimedia.org/P74878 and previous config saved to /var/cache/conftool/dbconfig/20250410-191214-fceratto.json
18:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2111
18:58 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2111
18:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2111.codfw.wmnet with OS bullseye
18:57 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2111.codfw.wmnet with OS bullseye
18:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P74877 and previous config saved to /var/cache/conftool/dbconfig/20250410-185706-fceratto.json
18:45 jforrester@deploy1003: Finished scap sync-world: Backport for WikifunctionsClientUsageUpdateJob: Also init targetPageNamespace, Special pages: Don't list or let execute repo-only ones on client wikis (T391594), InitializeSettings: add wgSecurePollEditOtherWikis (T384302) (duration: 12m 42s)
18:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P74875 and previous config saved to /var/cache/conftool/dbconfig/20250410-184159-fceratto.json
18:38 jforrester@deploy1003: novemlinguae, jforrester: Continuing with sync
18:37 jforrester@deploy1003: novemlinguae, jforrester: Backport for WikifunctionsClientUsageUpdateJob: Also init targetPageNamespace, Special pages: Don't list or let execute repo-only ones on client wikis (T391594), InitializeSettings: add wgSecurePollEditOtherWikis (T384302) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:33 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2111
18:33 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2111
18:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2111.codfw.wmnet with OS bullseye
18:33 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cirrussearch2111']
18:32 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2111']
18:32 jforrester@deploy1003: Started scap sync-world: Backport for WikifunctionsClientUsageUpdateJob: Also init targetPageNamespace, Special pages: Don't list or let execute repo-only ones on client wikis (T391594), InitializeSettings: add wgSecurePollEditOtherWikis (T384302)
18:31 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2111']
18:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T391056)', diff saved to https://phabricator.wikimedia.org/P74873 and previous config saved to /var/cache/conftool/dbconfig/20250410-182652-fceratto.json
18:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T391056)', diff saved to https://phabricator.wikimedia.org/P74872 and previous config saved to /var/cache/conftool/dbconfig/20250410-182319-fceratto.json
18:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2178.codfw.wmnet with reason: Maintenance
18:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T391056)', diff saved to https://phabricator.wikimedia.org/P74871 and previous config saved to /var/cache/conftool/dbconfig/20250410-182257-fceratto.json
18:21 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cirrussearch2111']
18:20 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.24 refs T386219
18:11 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2111']
18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cirrussearch2111']
18:11 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2111']
18:09 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cirrussearch2111']
18:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P74870 and previous config saved to /var/cache/conftool/dbconfig/20250410-180749-fceratto.json
18:07 brennen: 1.44.0-wmf.24 train status (T386219): logs quiet, no current blockers, moving to all wikis
18:00 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2111']
18:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2111.codfw.wmnet with OS bullseye
17:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2111
17:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2111
17:53 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2111.codfw.wmnet with OS bullseye
17:53 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2111.codfw.wmnet with OS bullseye
17:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P74869 and previous config saved to /var/cache/conftool/dbconfig/20250410-175242-fceratto.json
17:45 dancy@deploy1003: Finished scap sync-world: Backport for WikiLambdaApiBase: Add logging for every remaining dieWith?(Z)Error, Set WikiLambdaClientTargetAPI default value to protocol-relative, so HSTS doesn't sting us (T391534), WikifunctionsClientUsageUpdateJob: Don't pass a heavy Title in, just the scalars (T391533) (duration: 13m 28s)
17:39 dancy@deploy1003: dancy, jforrester: Continuing with sync
17:37 dancy@deploy1003: dancy, jforrester: Backport for WikiLambdaApiBase: Add logging for every remaining dieWith?(Z)Error, Set WikiLambdaClientTargetAPI default value to protocol-relative, so HSTS doesn't sting us (T391534), WikifunctionsClientUsageUpdateJob: Don't pass a heavy Title in, just the scalars (T391533) synced to the testservers (https://wikitech.wikimedi
17:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T391056)', diff saved to https://phabricator.wikimedia.org/P74868 and previous config saved to /var/cache/conftool/dbconfig/20250410-173735-fceratto.json
17:35 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2111
17:35 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2111
17:35 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2111.codfw.wmnet with OS bullseye
17:35 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2111.codfw.wmnet on all recursors
17:35 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2111.codfw.wmnet on all recursors
17:33 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T391056)', diff saved to https://phabricator.wikimedia.org/P74867 and previous config saved to /var/cache/conftool/dbconfig/20250410-173339-fceratto.json
17:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2171.codfw.wmnet with reason: Maintenance
17:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T391056)', diff saved to https://phabricator.wikimedia.org/P74866 and previous config saved to /var/cache/conftool/dbconfig/20250410-173315-fceratto.json
17:33 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2111 to cirrussearch2111
17:32 dancy@deploy1003: Started scap sync-world: Backport for WikiLambdaApiBase: Add logging for every remaining dieWith?(Z)Error, Set WikiLambdaClientTargetAPI default value to protocol-relative, so HSTS doesn't sting us (T391534), WikifunctionsClientUsageUpdateJob: Don't pass a heavy Title in, just the scalars (T391533)
17:32 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2111
17:31 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2111
17:31 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:31 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2111 to cirrussearch2111 - bking@cumin2002"
17:31 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2111 to cirrussearch2111 - bking@cumin2002"
17:25 bking@cumin2002: START - Cookbook sre.dns.netbox
17:24 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2111 to cirrussearch2111
17:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P74865 and previous config saved to /var/cache/conftool/dbconfig/20250410-171808-fceratto.json
17:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P74863 and previous config saved to /var/cache/conftool/dbconfig/20250410-170300-fceratto.json
16:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2075.codfw.wmnet with OS bullseye
16:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T391056)', diff saved to https://phabricator.wikimedia.org/P74862 and previous config saved to /var/cache/conftool/dbconfig/20250410-164753-fceratto.json
16:44 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T391056)', diff saved to https://phabricator.wikimedia.org/P74861 and previous config saved to /var/cache/conftool/dbconfig/20250410-164400-fceratto.json
16:43 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2157.codfw.wmnet with reason: Maintenance
16:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
16:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1245.eqiad.wmnet with reason: Maintenance
16:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1216.eqiad.wmnet with reason: Maintenance
16:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T391056)', diff saved to https://phabricator.wikimedia.org/P74860 and previous config saved to /var/cache/conftool/dbconfig/20250410-164049-fceratto.json
16:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2075.codfw.wmnet with reason: host reimage
16:34 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2075.codfw.wmnet with reason: host reimage
16:33 jiji@cumin1002: conftool action : set/pooled=yes; selector: name=mwdebug2002.codfw.wmnet
16:30 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwdebug2002.codfw.wmnet with OS bullseye
16:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P74859 and previous config saved to /var/cache/conftool/dbconfig/20250410-162542-fceratto.json
16:18 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2075
16:18 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2075
16:18 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2075
16:18 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2075.codfw.wmnet 145.0.192.10.in-addr.arpa 5.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:18 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2075.codfw.wmnet 145.0.192.10.in-addr.arpa 5.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:18 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:18 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2075 - bking@cumin2002"
16:18 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2075 - bking@cumin2002"
16:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P74858 and previous config saved to /var/cache/conftool/dbconfig/20250410-161036-fceratto.json
16:06 bking@cumin2002: START - Cookbook sre.dns.netbox
16:06 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2075
16:06 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2075.codfw.wmnet with OS bullseye
16:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2075 to cirrussearch2075
16:02 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2075
16:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2075
16:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2075 to cirrussearch2075 - bking@cumin2002"
15:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T391056)', diff saved to https://phabricator.wikimedia.org/P74857 and previous config saved to /var/cache/conftool/dbconfig/20250410-155528-fceratto.json
15:54 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on releases2003.codfw.wmnet with reason: Bookworm Re-image
15:52 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T391056)', diff saved to https://phabricator.wikimedia.org/P74856 and previous config saved to /var/cache/conftool/dbconfig/20250410-155241-fceratto.json
15:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1210.eqiad.wmnet with reason: Maintenance
15:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T391056)', diff saved to https://phabricator.wikimedia.org/P74855 and previous config saved to /var/cache/conftool/dbconfig/20250410-155220-fceratto.json
15:52 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwdebug2002.codfw.wmnet with reason: host reimage
15:48 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mwdebug2002.codfw.wmnet with reason: host reimage
15:41 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2075 to cirrussearch2075 - bking@cumin2002"
15:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P74854 and previous config saved to /var/cache/conftool/dbconfig/20250410-153713-fceratto.json
15:29 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mwdebug2002.codfw.wmnet with OS bullseye
15:23 bking@cumin2002: START - Cookbook sre.dns.netbox
15:23 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2075 to cirrussearch2075
15:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P74852 and previous config saved to /var/cache/conftool/dbconfig/20250410-152206-fceratto.json
15:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2074.codfw.wmnet with OS bullseye
15:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T391056)', diff saved to https://phabricator.wikimedia.org/P74851 and previous config saved to /var/cache/conftool/dbconfig/20250410-150658-fceratto.json
15:04 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T391056)', diff saved to https://phabricator.wikimedia.org/P74850 and previous config saved to /var/cache/conftool/dbconfig/20250410-150431-fceratto.json
15:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1200.eqiad.wmnet with reason: Maintenance
15:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T391056)', diff saved to https://phabricator.wikimedia.org/P74849 and previous config saved to /var/cache/conftool/dbconfig/20250410-150407-fceratto.json
14:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2074.codfw.wmnet with reason: host reimage
14:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P74848 and previous config saved to /var/cache/conftool/dbconfig/20250410-144900-fceratto.json
14:47 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2074.codfw.wmnet with reason: host reimage
14:37 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
14:36 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
14:36 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
14:36 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
14:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P74847 and previous config saved to /var/cache/conftool/dbconfig/20250410-143352-fceratto.json
14:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2074
14:31 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2074
14:28 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2074
14:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2074.codfw.wmnet 138.0.192.10.in-addr.arpa 8.3.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
14:28 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2074.codfw.wmnet 138.0.192.10.in-addr.arpa 8.3.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
14:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:28 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2074 - bking@cumin2002"
14:28 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2074 - bking@cumin2002"
14:24 bking@cumin2002: START - Cookbook sre.dns.netbox
14:23 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2074
14:23 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2074.codfw.wmnet with OS bullseye
14:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2074 to cirrussearch2074
14:21 godog: stop curator_actions_cluster_wide.service on logging-sd1001 - forcemerge causing kafka lag
14:21 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2074
14:21 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2074
14:21 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:21 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2074 to cirrussearch2074 - bking@cumin2002"
14:20 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2074 to cirrussearch2074 - bking@cumin2002"
14:20 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc1002.eqiad.wmnet
14:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T391056)', diff saved to https://phabricator.wikimedia.org/P74845 and previous config saved to /var/cache/conftool/dbconfig/20250410-141845-fceratto.json
14:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142']
14:16 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T391056)', diff saved to https://phabricator.wikimedia.org/P74844 and previous config saved to /var/cache/conftool/dbconfig/20250410-141619-fceratto.json
14:16 bking@cumin2002: START - Cookbook sre.dns.netbox
14:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1185.eqiad.wmnet with reason: Maintenance
14:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T391056)', diff saved to https://phabricator.wikimedia.org/P74843 and previous config saved to /var/cache/conftool/dbconfig/20250410-141608-fceratto.json
14:15 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2074 to cirrussearch2074
14:14 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-misc1002.eqiad.wmnet
14:14 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc1001.eqiad.wmnet
14:09 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-misc1001.eqiad.wmnet
14:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P74842 and previous config saved to /var/cache/conftool/dbconfig/20250410-140100-fceratto.json
13:56 jiji@cumin1002: conftool action : set/pooled=inactive; selector: name=mwdebug2002.codfw.wmnet
13:55 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
13:55 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
13:55 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
13:54 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
13:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142']
13:51 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142']
13:49 jiji@cumin1002: conftool action : set/pooled=yes; selector: name=mwdebug1002.eqiad.wmnet
13:46 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1005.eqiad.wmnet
13:46 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:46 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
13:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P74841 and previous config saved to /var/cache/conftool/dbconfig/20250410-134553-fceratto.json
13:44 Lucas_WMDE: UTC afternoon backport+config window done
13:44 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
13:39 andrew@cumin1002: START - Cookbook sre.dns.netbox
13:37 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for AX: Enable Quick Surveys extension on Asturian and Lombard wiki (T390023), AX: Enable entry-points on Asturian and Lombard wiki (T390023) (duration: 15m 42s)
13:34 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1135431 to enable haproxy requestctl rules everywhere (T370745)
13:34 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol1005.eqiad.wmnet
13:31 lucaswerkmeister-wmde@deploy1003: abi, lucaswerkmeister-wmde: Continuing with sync
13:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T391056)', diff saved to https://phabricator.wikimedia.org/P74840 and previous config saved to /var/cache/conftool/dbconfig/20250410-133046-fceratto.json
13:28 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwdebug1002.eqiad.wmnet with OS bullseye
13:27 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1183 (T391056)', diff saved to https://phabricator.wikimedia.org/P74839 and previous config saved to /var/cache/conftool/dbconfig/20250410-132756-fceratto.json
13:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1183.eqiad.wmnet with reason: Maintenance
13:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T391056)', diff saved to https://phabricator.wikimedia.org/P74838 and previous config saved to /var/cache/conftool/dbconfig/20250410-132744-fceratto.json
13:27 lucaswerkmeister-wmde@deploy1003: abi, lucaswerkmeister-wmde: Backport for AX: Enable Quick Surveys extension on Asturian and Lombard wiki (T390023), AX: Enable entry-points on Asturian and Lombard wiki (T390023) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:26 tappof: expand LVs on prometheus instances (k8s-dse)
13:22 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for AX: Enable Quick Surveys extension on Asturian and Lombard wiki (T390023), AX: Enable entry-points on Asturian and Lombard wiki (T390023)
13:20 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1169.eqiad.wmnet with OS bullseye
13:13 klausman@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
13:12 klausman@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
13:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P74837 and previous config saved to /var/cache/conftool/dbconfig/20250410-131237-fceratto.json
12:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P74836 and previous config saved to /var/cache/conftool/dbconfig/20250410-125729-fceratto.json
12:56 klausman@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
12:56 klausman@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
12:52 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwdebug1002.eqiad.wmnet with reason: host reimage
12:51 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf2002.codfw.wmnet
12:48 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mwdebug1002.eqiad.wmnet with reason: host reimage
12:45 reedy@deploy1003: Synchronized wmf-config/interwiki-labs.php: Update! (duration: 14m 07s)
12:43 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-wf2002.codfw.wmnet
12:43 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf2001.codfw.wmnet
12:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T391056)', diff saved to https://phabricator.wikimedia.org/P74834 and previous config saved to /var/cache/conftool/dbconfig/20250410-124222-fceratto.json
12:39 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T391056)', diff saved to https://phabricator.wikimedia.org/P74833 and previous config saved to /var/cache/conftool/dbconfig/20250410-123931-fceratto.json
12:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
12:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1161.eqiad.wmnet with reason: Maintenance
12:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T391056)', diff saved to https://phabricator.wikimedia.org/P74832 and previous config saved to /var/cache/conftool/dbconfig/20250410-123850-fceratto.json
12:37 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-wf2001.codfw.wmnet
12:36 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf1002.eqiad.wmnet
12:29 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-wf1002.eqiad.wmnet
12:29 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf1001.eqiad.wmnet
12:28 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mwdebug1002.eqiad.wmnet with OS bullseye
12:26 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2055.codfw.wmnet
12:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P74831 and previous config saved to /var/cache/conftool/dbconfig/20250410-122343-fceratto.json
12:22 btullis@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1169
12:22 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-wf1001.eqiad.wmnet
12:22 btullis@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1169
12:21 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:21 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1169 - btullis@cumin1002"
12:20 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1169 - btullis@cumin1002"
12:20 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2055.codfw.wmnet
12:20 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
12:19 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
12:18 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
12:17 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
12:15 cgoubert@deploy1003: Finished scap sync-world: Rebuilding mediawiki images to pick up new base images 1135694 - T387208 (duration: 44m 51s)
12:14 btullis@cumin1002: START - Cookbook sre.dns.netbox
12:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P74829 and previous config saved to /var/cache/conftool/dbconfig/20250410-120835-fceratto.json
12:08 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
12:06 btullis@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1169
12:06 btullis@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1169
12:06 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:03 btullis@cumin1002: START - Cookbook sre.dns.netbox
12:03 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
12:01 btullis@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1169
12:01 btullis@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1169
12:00 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
11:59 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
11:59 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
11:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
11:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
11:57 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
11:57 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
11:57 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2054.codfw.wmnet
11:57 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1054.eqiad.wmnet
11:56 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
11:56 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
11:56 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
11:55 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
11:55 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
11:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
11:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
11:53 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
11:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T391056)', diff saved to https://phabricator.wikimedia.org/P74828 and previous config saved to /var/cache/conftool/dbconfig/20250410-115328-fceratto.json
11:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
11:50 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1159 (T391056)', diff saved to https://phabricator.wikimedia.org/P74827 and previous config saved to /var/cache/conftool/dbconfig/20250410-115037-fceratto.json
11:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2054.codfw.wmnet
11:50 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1159.eqiad.wmnet with reason: Maintenance
11:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1054.eqiad.wmnet
11:32 cgoubert@deploy1003: Started scap sync-world: Rebuilding mediawiki images to pick up new base images 1135694 - T387208
11:28 claime: Rebuilding php base images to pick up 1135694 - T387208
11:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 85% (T360589) (duration: 16m 20s)
11:17 ladsgroup@deploy1003: ladsgroup: Continuing with sync
11:15 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 85% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:10 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 85% (T360589)
11:08 cgoubert@deploy1003: Finished scap sync-world: Backport for MWScript.php: exit code on mesh, longer timeout (T390972 T387208) (duration: 22m 15s)
10:55 cgoubert@deploy1003: cgoubert: Continuing with sync
10:54 cgoubert@deploy1003: cgoubert: Backport for MWScript.php: exit code on mesh, longer timeout (T390972 T387208) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:45 cgoubert@deploy1003: Started scap sync-world: Backport for MWScript.php: exit code on mesh, longer timeout (T390972 T387208)
10:28 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: sync
10:28 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: sync
10:26 elukey: rest-gateway from now on calls citoid on its ingress endpoint
10:23 phedenskog@deploy1003: Finished deploy [performance/navtiming@94fa387]: Disable navtiming performance metrics in Graphite (duration: 00m 50s)
10:23 phedenskog@deploy1003: Started deploy [performance/navtiming@94fa387]: Disable navtiming performance metrics in Graphite
10:21 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: sync
10:20 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: sync
10:19 cgoubert@deploy1003: Started scap sync-world: Rebuilding mediawiki images to pick up new base images 1135379 - T387208
10:19 cgoubert@deploy1003: sync-world aborted: Rebuilding mediawiki images to pick up new base images 1135379 - T387208 (duration: 35m 23s)
09:55 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
09:55 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
09:55 fabfur@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading A:liberica
09:50 fabfur@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading A:liberica
09:45 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
09:45 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
09:44 cgoubert@deploy1003: Started scap sync-world: Rebuilding mediawiki images to pick up new base images 1135379 - T387208
09:40 claime: Rebuilding php base images to pick up 1135379 - T387208
09:39 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: sync
09:38 elukey@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: sync
09:32 topranks: decom 2x10G lag from cloudsw1-c8-eqiad to asw2-b-eqiad T391489
09:24 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2053.codfw.wmnet
09:23 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1053.eqiad.wmnet
09:17 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1053.eqiad.wmnet
09:17 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2053.codfw.wmnet
09:13 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica-esams and A:liberica
09:11 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica-esams and A:liberica
09:11 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica-drmrs and A:liberica
09:08 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica-drmrs and A:liberica
09:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica-eqsin and A:liberica
09:03 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica-eqsin and A:liberica
09:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica-ulsfo and A:liberica
09:00 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica-ulsfo and A:liberica
08:59 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica-magru and not P{lvs7003.magru.wmnet} and A:liberica
08:57 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica-magru and not P{lvs7003.magru.wmnet} and A:liberica
08:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing P{lvs7003.magru.wmnet} and A:liberica
08:53 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing P{lvs7003.magru.wmnet} and A:liberica
08:41 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: sync
08:41 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: sync
08:40 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: sync
08:40 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: sync
08:40 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: sync
08:40 elukey@deploy1003: helmfile [staging] START helmfile.d/services/citoid: sync
08:22 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica-canary
08:22 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica-canary
08:20 vgutierrez: upload liberica 0.13 to bookworm-wikimedia (apt.wm.o)
08:18 elukey@dns1004: END - running authdns-update
08:16 elukey@dns1004: START - running authdns-update
08:02 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.upgrade (exit_code=1) restarting A:liberica-canary
08:01 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting A:liberica-canary
07:56 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica-canary
07:56 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica-canary
07:47 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting A:liberica-canary
07:47 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling A:liberica-canary
07:47 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin pooling A:liberica-canary
07:47 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling A:liberica-canary
07:47 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin depooling A:liberica-canary
07:46 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting A:liberica-canary
07:44 vgutierrez: rollback to liberica 0.11 in lvs1013
07:40 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica-canary
07:39 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica-canary
07:35 vgutierrez: upload liberica 0.12 to bookworm-wikimedia (apt.wm.o)
07:21 marostegui@cumin1002: dbctl commit (dc=all): 'Add db1180 to s6 vslow/dump', diff saved to https://phabricator.wikimedia.org/P74824 and previous config saved to /var/cache/conftool/dbconfig/20250410-072127-marostegui.json
06:55 marostegui: Migrate pc2 to MariaDB 10.11 T391454
06:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc2 T391454', diff saved to https://phabricator.wikimedia.org/P74823 and previous config saved to /var/cache/conftool/dbconfig/20250410-065208-marostegui.json
06:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2012.codfw.wmnet,pc1012.eqiad.wmnet with reason: Maintenance
06:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc2 T391454', diff saved to https://phabricator.wikimedia.org/P74822 and previous config saved to /var/cache/conftool/dbconfig/20250410-064511-marostegui.json

2025-04-09

22:53 mutante: apt-staging2001 - sudo systemctl start gitlab-package-puller to fix monitoring alert
22:47 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: security release
22:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:09 jclark@cumin1002: START - Cookbook sre.hosts.provision for host druid1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:08 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:08 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for druid1012/1013 - jclark@cumin1002"
22:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for druid1012/1013 - jclark@cumin1002"
22:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host druid1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:04 jclark@cumin1002: START - Cookbook sre.dns.netbox
21:40 jforrester@deploy1003: Finished scap sync-world: Backport for [test2wiki] Enable Wikifunctions client mode (T383106), MWMultiVersion: Recognise the new wikifunctionsclient dblist (duration: 18m 01s)
21:34 jforrester@deploy1003: jforrester: Continuing with sync
21:29 jforrester@deploy1003: jforrester: Backport for [test2wiki] Enable Wikifunctions client mode (T383106), MWMultiVersion: Recognise the new wikifunctionsclient dblist synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
21:22 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
21:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:22 jforrester@deploy1003: Started scap sync-world: Backport for [test2wiki] Enable Wikifunctions client mode (T383106), MWMultiVersion: Recognise the new wikifunctionsclient dblist
21:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-druid1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:20 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-druid1006
21:20 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-druid1006
21:20 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-druid1007
21:20 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-druid1007
21:19 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:19 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for an-druid - jclark@cumin1002"
21:19 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for an-druid - jclark@cumin1002"
21:19 jforrester@deploy1003: Sync cancelled.
21:15 jclark@cumin1002: START - Cookbook sre.dns.netbox
21:15 jforrester@deploy1003: jforrester: Backport for [test2wiki] Enable Wikifunctions client mode (T383106) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:11 ejegg: fundraising civicrm upgraded from b20436a2 to 38a7a649
21:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-druid1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:08 jforrester@deploy1003: Started scap sync-world: Backport for [test2wiki] Enable Wikifunctions client mode (T383106)
19:37 dancy@deploy1003: Installation of scap version "4.153.0" completed for 2 hosts
19:35 dancy@deploy1003: Installing scap version "4.153.0" for 2 host(s)
19:24 fab@deploy1003: Finished deploy [airflow-dags/research@ea5f3de]: (no justification provided) (duration: 00m 41s)
19:24 fab@deploy1003: Started deploy [airflow-dags/research@ea5f3de]: (no justification provided)
19:14 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release
18:48 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: security release
18:40 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release
18:20 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.24 refs T386219
18:06 brennen: 1.44.0-wmf.24 train status (T386219): logs quiet, no current blockers, moving to group1
18:04 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 9 hosts with reason: adding net-new role
17:50 swfrench@deploy1003: Finished scap sync-world: Test scap run after switching to PHP 8.1 container image for maintenance scripts - T390225 (duration: 03m 10s)
17:47 swfrench@deploy1003: Started scap sync-world: Test scap run after switching to PHP 8.1 container image for maintenance scripts - T390225
17:46 swfrench@deploy1003: Stopping before sync operations
17:45 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: security release
17:45 swfrench@deploy1003: Started scap sync-world: Test stop-before-sync scap run after switching to PHP 8.1 container image for maintenance scripts - T390225
17:38 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
17:34 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
17:33 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
17:21 ladsgroup@deploy1003: Finished scap sync-world: Backport for Increase max db connection count before circuit breaking (T390510) (duration: 16m 47s)
17:19 mutante: apt1002 - updating thirdparty/gitlab-bullseye gitlab-ce package version
17:12 ladsgroup@deploy1003: ladsgroup: Continuing with sync
17:11 ladsgroup@deploy1003: ladsgroup: Backport for Increase max db connection count before circuit breaking (T390510) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:04 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
17:04 sukhe: forcing rechecks for pc1011 and db1151
17:04 ladsgroup@deploy1003: Started scap sync-world: Backport for Increase max db connection count before circuit breaking (T390510)
17:04 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
17:01 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
17:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2109.codfw.wmnet on all recursors
17:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2109.codfw.wmnet on all recursors
17:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2085.codfw.wmnet on all recursors
17:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2085.codfw.wmnet on all recursors
16:59 sukhe: [END] sudo cumin -b11 "O:mariadb::core" "run-puppet-agent"
16:46 sukhe: sudo cumin -b11 "O:mariadb::core" "run-puppet-agent"
16:44 sukhe: forcing puppet run on db2229
16:38 sukhe: merging above change: CR 1135471
16:17 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2088.codfw.wmnet with reason: host reimage
16:10 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2068.codfw.wmnet with reason: host reimage
15:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:55 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2068.codfw.wmnet with reason: host reimage
15:50 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2088
15:50 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2088
15:50 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2088.codfw.wmnet with OS bullseye
15:48 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cirrussearch2088.codfw.wmnet']
15:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:38 sukhe: reprepro -C component/nginx-ech include bookworm-wikimedia openssl_3.4.1-1+ech1_amd64.changes: T205378
15:38 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2088.codfw.wmnet']
15:35 jforrester@deploy1003: Started scap sync-world: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names, Add wikifunctionsclient dblist for production wikis that allow embedding Wikifunctions calls
15:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:26 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cirrussearch2088.codfw.wmnet']
15:21 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:20 elukey: restart docker on deploy1003 to revert the push serialization change - T390251
15:16 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2088.codfw.wmnet']
15:11 vgutierrez: upgrading to varnish 7.1.1-1.1~bpo11+wmf3 in cp3073 (text) and cp3081 (upload) - T391334
15:10 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2068
15:07 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2068
15:07 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2068
15:07 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2068.codfw.wmnet 102.48.192.10.in-addr.arpa 2.0.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:07 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2068.codfw.wmnet 102.48.192.10.in-addr.arpa 2.0.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:07 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:07 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2068 - bking@cumin2002"
15:07 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2068 - bking@cumin2002"
15:06 jforrester@deploy1003: sync-world aborted: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names, Add wikifunctionsclient dblist for production wikis that allow embedding Wikifunctions calls (duration: 17m 08s)
14:57 bking@cumin2002: START - Cookbook sre.dns.netbox
14:57 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2068
14:56 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2068.codfw.wmnet with OS bullseye
14:56 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2068.codfw.wmnet on all recursors
14:55 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2068.codfw.wmnet on all recursors
14:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2068 to cirrussearch2068
14:55 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2068
14:55 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2068
14:55 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:55 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2068 to cirrussearch2068 - bking@cumin2002"
14:54 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2068 to cirrussearch2068 - bking@cumin2002"
14:49 bking@cumin2002: START - Cookbook sre.dns.netbox
14:49 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2068 to cirrussearch2068
14:49 jforrester@deploy1003: Started scap sync-world: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names, Add wikifunctionsclient dblist for production wikis that allow embedding Wikifunctions calls
14:47 elukey: restart docker on deploy1003
14:47 jforrester@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.44.0-wmf.23,1.44.0-wmf.24 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.disco
14:42 jforrester@deploy1003: Started scap sync-world: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names, Add wikifunctionsclient dblist for production wikis that allow embedding Wikifunctions calls
14:41 jforrester@deploy1003: sync-world aborted: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names (duration: 06m 26s)
14:36 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cirrussearch2088.codfw.wmnet']
14:34 jforrester@deploy1003: Started scap sync-world: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names
14:33 jforrester@deploy1003: sync-world aborted: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names (duration: 04m 52s)
14:32 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
14:30 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2088.codfw.wmnet']
14:29 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2088.codfw.wmnet with OS bullseye
14:29 jforrester@deploy1003: Started scap sync-world: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names
14:28 jforrester@deploy1003: sync-world aborted: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names (duration: 08m 48s)
14:19 jforrester@deploy1003: Started scap sync-world: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names
14:18 jforrester@deploy1003: sync-world aborted: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names (duration: 07m 47s)
14:11 jforrester@deploy1003: Started scap sync-world: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names
14:09 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:09 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:08 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:08 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:08 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:07 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:04 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:04 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:03 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
13:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2088
13:38 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2088
13:38 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2088.codfw.wmnet with OS bullseye
13:23 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwdebug1002.eqiad.wmnet with OS bullseye
13:23 samtar@deploy1003: Finished scap sync-world: Backport for madwiktionary: add logo, icon, wordmark and tagline (T391318), arywiki: enable wgMinervaEnableSiteNotice (duration: 16m 14s)
13:17 samtar@deploy1003: samtar, anzx: Continuing with sync
13:15 samtar@deploy1003: samtar, anzx: Backport for madwiktionary: add logo, icon, wordmark and tagline (T391318), arywiki: enable wgMinervaEnableSiteNotice synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:07 samtar@deploy1003: Started scap sync-world: Backport for madwiktionary: add logo, icon, wordmark and tagline (T391318), arywiki: enable wgMinervaEnableSiteNotice
13:00 awight: special window completed
12:47 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwdebug1002.eqiad.wmnet with reason: host reimage
12:43 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mwdebug1002.eqiad.wmnet with reason: host reimage
12:27 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mwdebug1002.eqiad.wmnet with OS bullseye
12:10 effie: mwdebug1002 has been depooled and removed from scap dsh
12:09 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2052.codfw.wmnet
12:09 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1052.eqiad.wmnet
12:06 effie: prepping mwdebug1002 for reimage
11:41 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
11:41 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
11:38 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
11:37 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
11:32 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
11:30 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
11:29 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:28 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
11:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1407.eqiad.wmnet
11:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw1407.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - hnowlan@cumin1002"
11:21 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw1407.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - hnowlan@cumin1002"
11:20 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
11:19 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
11:19 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
11:18 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
11:17 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
11:17 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
11:16 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
11:14 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2051.codfw.wmnet
11:14 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1051.eqiad.wmnet
11:10 hnowlan@cumin1002: START - Cookbook sre.hosts.decommission for hosts mw1407.eqiad.wmnet
11:07 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1051.eqiad.wmnet
11:07 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2051.codfw.wmnet
11:06 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2050.codfw.wmnet
11:05 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1050.eqiad.wmnet
11:04 jiji@cumin1002: conftool action : set/pooled=inactive; selector: name=mwdebug1002.eqiad.wmnet
11:01 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2278.codfw.wmnet
11:01 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:00 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2050.codfw.wmnet
10:59 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1050.eqiad.wmnet
10:59 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
10:59 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1349-1351].eqiad.wmnet
10:59 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:59 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1349-1351].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - hnowlan@cumin1002"
10:59 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1349-1351].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - hnowlan@cumin1002"
10:54 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
10:53 hnowlan@cumin1002: START - Cookbook sre.hosts.decommission for hosts mw2278.codfw.wmnet
10:50 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw[2278-2279].codfw.wmnet
10:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2278-2279].codfw.wmnet decommissioned, removing all IPs except the asset tag one - hnowlan@cumin1002"
10:49 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2278-2279].codfw.wmnet decommissioned, removing all IPs except the asset tag one - hnowlan@cumin1002"
10:42 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
10:41 hnowlan@cumin1002: START - Cookbook sre.hosts.decommission for hosts mw[1349-1351].eqiad.wmnet
10:37 hnowlan@cumin1002: START - Cookbook sre.hosts.decommission for hosts mw[2278-2279].codfw.wmnet
10:23 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: sync
10:22 elukey@deploy1003: helmfile [staging] START helmfile.d/services/citoid: sync
10:19 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
10:18 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 80% (T360589) (duration: 14m 19s)
10:18 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
10:18 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
10:18 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
10:12 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2049.codfw.wmnet
10:12 ladsgroup@deploy1003: ladsgroup: Continuing with sync
10:12 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 80% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:11 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1049.eqiad.wmnet
10:05 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1049.eqiad.wmnet
10:05 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2049.codfw.wmnet
10:04 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 80% (T360589)
09:37 cgoubert@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wikikube-worker2142.codfw.wmnet with reason: Hardware failure
09:36 cgoubert@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) check for host wikikube-worker2142.codfw.wmnet
09:36 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-worker2142.codfw.wmnet
09:32 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
09:32 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
09:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
09:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
09:18 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: sync
09:18 elukey@deploy1003: helmfile [staging] START helmfile.d/services/citoid: sync
09:05 elukey: rollout security upgrades for ghostscript
08:54 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
08:54 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
08:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2243 (T391056)', diff saved to https://phabricator.wikimedia.org/P74814 and previous config saved to /var/cache/conftool/dbconfig/20250409-085347-fceratto.json
08:50 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
08:49 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
08:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2243', diff saved to https://phabricator.wikimedia.org/P74813 and previous config saved to /var/cache/conftool/dbconfig/20250409-083840-fceratto.json
08:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2243', diff saved to https://phabricator.wikimedia.org/P74812 and previous config saved to /var/cache/conftool/dbconfig/20250409-082333-fceratto.json
08:09 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
08:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2090.codfw.wmnet with OS bullseye
08:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2243 (T391056)', diff saved to https://phabricator.wikimedia.org/P74811 and previous config saved to /var/cache/conftool/dbconfig/20250409-080826-fceratto.json
07:58 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2243 (T391056)', diff saved to https://phabricator.wikimedia.org/P74810 and previous config saved to /var/cache/conftool/dbconfig/20250409-075815-fceratto.json
07:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2243.codfw.wmnet with reason: Maintenance
07:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2242.codfw.wmnet with reason: Maintenance
07:41 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2090.codfw.wmnet with reason: host reimage
07:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2241.codfw.wmnet with reason: Maintenance
07:37 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2090.codfw.wmnet with reason: host reimage
07:35 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
07:34 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
07:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2200.codfw.wmnet with reason: Maintenance
07:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2198.codfw.wmnet with reason: Maintenance
07:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T391056)', diff saved to https://phabricator.wikimedia.org/P74809 and previous config saved to /var/cache/conftool/dbconfig/20250409-072240-fceratto.json
07:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2090
07:19 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2090
07:18 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2090
07:18 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2090.codfw.wmnet 97.0.192.10.in-addr.arpa 7.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
07:17 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2090.codfw.wmnet 97.0.192.10.in-addr.arpa 7.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
07:17 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:17 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2090 - bking@cumin2002"
07:17 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2090 - bking@cumin2002"
07:09 bking@cumin2002: START - Cookbook sre.dns.netbox
07:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P74808 and previous config saved to /var/cache/conftool/dbconfig/20250409-070733-fceratto.json
07:05 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2090
07:05 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2090.codfw.wmnet with OS bullseye
06:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P74807 and previous config saved to /var/cache/conftool/dbconfig/20250409-065225-fceratto.json
06:47 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2090.codfw.wmnet on all recursors
06:47 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2090.codfw.wmnet on all recursors
06:47 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2090 to cirrussearch2090
06:46 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2090
06:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T391056)', diff saved to https://phabricator.wikimedia.org/P74806 and previous config saved to /var/cache/conftool/dbconfig/20250409-063718-fceratto.json
06:25 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T391056)', diff saved to https://phabricator.wikimedia.org/P74805 and previous config saved to /var/cache/conftool/dbconfig/20250409-062542-fceratto.json
06:25 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2195.codfw.wmnet with reason: Maintenance
06:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T391056)', diff saved to https://phabricator.wikimedia.org/P74804 and previous config saved to /var/cache/conftool/dbconfig/20250409-062519-fceratto.json
06:20 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2090
06:20 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:20 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2090 to cirrussearch2090 - bking@cumin2002"
06:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P74803 and previous config saved to /var/cache/conftool/dbconfig/20250409-061012-fceratto.json
05:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repool ms1 T391317', diff saved to https://phabricator.wikimedia.org/P74802 and previous config saved to /var/cache/conftool/dbconfig/20250409-055903-marostegui.json
05:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P74801 and previous config saved to /var/cache/conftool/dbconfig/20250409-055504-fceratto.json
05:50 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2142.codfw.wmnet,db1152.eqiad.wmnet with reason: Maintenance
05:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depool ms1 T391317', diff saved to https://phabricator.wikimedia.org/P74800 and previous config saved to /var/cache/conftool/dbconfig/20250409-055028-marostegui.json
05:49 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2090 to cirrussearch2090 - bking@cumin2002"
05:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T391056)', diff saved to https://phabricator.wikimedia.org/P74799 and previous config saved to /var/cache/conftool/dbconfig/20250409-053957-fceratto.json
05:27 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T391056)', diff saved to https://phabricator.wikimedia.org/P74798 and previous config saved to /var/cache/conftool/dbconfig/20250409-052719-fceratto.json
05:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2181.codfw.wmnet with reason: Maintenance
05:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T391056)', diff saved to https://phabricator.wikimedia.org/P74797 and previous config saved to /var/cache/conftool/dbconfig/20250409-052656-fceratto.json
05:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P74796 and previous config saved to /var/cache/conftool/dbconfig/20250409-051149-fceratto.json
05:05 bking@cumin2002: START - Cookbook sre.dns.netbox
05:05 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2090 to cirrussearch2090
05:01 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
04:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P74795 and previous config saved to /var/cache/conftool/dbconfig/20250409-045642-fceratto.json
04:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T391056)', diff saved to https://phabricator.wikimedia.org/P74794 and previous config saved to /var/cache/conftool/dbconfig/20250409-044134-fceratto.json
04:28 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T391056)', diff saved to https://phabricator.wikimedia.org/P74793 and previous config saved to /var/cache/conftool/dbconfig/20250409-042846-fceratto.json
04:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2167.codfw.wmnet with reason: Maintenance
04:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T391056)', diff saved to https://phabricator.wikimedia.org/P74792 and previous config saved to /var/cache/conftool/dbconfig/20250409-042824-fceratto.json
04:17 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
04:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2089.codfw.wmnet with OS bullseye
04:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P74791 and previous config saved to /var/cache/conftool/dbconfig/20250409-041317-fceratto.json
03:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P74790 and previous config saved to /var/cache/conftool/dbconfig/20250409-035810-fceratto.json
03:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2089.codfw.wmnet with reason: host reimage
03:48 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2089.codfw.wmnet with reason: host reimage
03:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T391056)', diff saved to https://phabricator.wikimedia.org/P74789 and previous config saved to /var/cache/conftool/dbconfig/20250409-034302-fceratto.json
03:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2089
03:30 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2089
03:30 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T391056)', diff saved to https://phabricator.wikimedia.org/P74788 and previous config saved to /var/cache/conftool/dbconfig/20250409-033025-fceratto.json
03:30 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2166.codfw.wmnet with reason: Maintenance
03:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T391056)', diff saved to https://phabricator.wikimedia.org/P74787 and previous config saved to /var/cache/conftool/dbconfig/20250409-033001-fceratto.json
03:26 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2089
03:26 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2089.codfw.wmnet 92.0.192.10.in-addr.arpa 2.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
03:26 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2089.codfw.wmnet 92.0.192.10.in-addr.arpa 2.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
03:26 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
03:25 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2089 - bking@cumin2002"
03:25 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2089 - bking@cumin2002"
03:20 bking@cumin2002: START - Cookbook sre.dns.netbox
03:20 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2089
03:20 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2089.codfw.wmnet with OS bullseye
03:18 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2089.codfw.wmnet on all recursors
03:18 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2089.codfw.wmnet on all recursors
03:18 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2089 to cirrussearch2089
03:18 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2089
03:17 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2089
03:17 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
03:17 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2089 to cirrussearch2089 - bking@cumin2002"
03:15 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2089 to cirrussearch2089 - bking@cumin2002"
03:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74786 and previous config saved to /var/cache/conftool/dbconfig/20250409-031453-fceratto.json
03:09 bking@cumin2002: START - Cookbook sre.dns.netbox
03:09 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2089 to cirrussearch2089
03:08 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2088.codfw.wmnet with OS bullseye
03:08 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2045.codfw.wmnet with OS bookworm
03:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
02:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74785 and previous config saved to /var/cache/conftool/dbconfig/20250409-025946-fceratto.json
02:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
02:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
02:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
02:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
02:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
02:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
02:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
02:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
02:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
02:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
02:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
02:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T391056)', diff saved to https://phabricator.wikimedia.org/P74784 and previous config saved to /var/cache/conftool/dbconfig/20250409-024439-fceratto.json
02:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
02:34 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2050
02:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2050
02:33 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2049
02:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2049
02:33 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2048
02:33 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2047
02:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2048
02:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2047
02:32 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
02:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti servers to codfw - jhancock@cumin2002"
02:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti servers to codfw - jhancock@cumin2002"
02:31 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T391056)', diff saved to https://phabricator.wikimedia.org/P74783 and previous config saved to /var/cache/conftool/dbconfig/20250409-023156-fceratto.json
02:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2165.codfw.wmnet with reason: Maintenance
02:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T391056)', diff saved to https://phabricator.wikimedia.org/P74782 and previous config saved to /var/cache/conftool/dbconfig/20250409-023134-fceratto.json
02:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox
02:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
02:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox
02:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P74781 and previous config saved to /var/cache/conftool/dbconfig/20250409-021626-fceratto.json
02:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P74780 and previous config saved to /var/cache/conftool/dbconfig/20250409-020119-fceratto.json
01:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T391056)', diff saved to https://phabricator.wikimedia.org/P74779 and previous config saved to /var/cache/conftool/dbconfig/20250409-014612-fceratto.json
01:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2088
01:33 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2088
01:33 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2088
01:33 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2088.codfw.wmnet 91.0.192.10.in-addr.arpa 1.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
01:33 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2088.codfw.wmnet 91.0.192.10.in-addr.arpa 1.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
01:33 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:33 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2088 - bking@cumin2002"
01:33 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2088 - bking@cumin2002"
01:33 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T391056)', diff saved to https://phabricator.wikimedia.org/P74778 and previous config saved to /var/cache/conftool/dbconfig/20250409-013316-fceratto.json
01:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2186.codfw.wmnet with reason: Maintenance
01:32 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2164.codfw.wmnet with reason: Maintenance
01:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T391056)', diff saved to https://phabricator.wikimedia.org/P74777 and previous config saved to /var/cache/conftool/dbconfig/20250409-013238-fceratto.json
01:24 bking@cumin2002: START - Cookbook sre.dns.netbox
01:24 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2088
01:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2088.codfw.wmnet with OS bullseye
01:23 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2088.codfw.wmnet on all recursors
01:23 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2088.codfw.wmnet on all recursors
01:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2088 to cirrussearch2088
01:22 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2088
01:21 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2088
01:21 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:21 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2088 to cirrussearch2088 - bking@cumin2002"
01:20 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2088 to cirrussearch2088 - bking@cumin2002"
01:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P74776 and previous config saved to /var/cache/conftool/dbconfig/20250409-011731-fceratto.json
01:15 bking@cumin2002: START - Cookbook sre.dns.netbox
01:15 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2088 to cirrussearch2088
01:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2062.codfw.wmnet with OS bullseye
01:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P74775 and previous config saved to /var/cache/conftool/dbconfig/20250409-010224-fceratto.json
00:49 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2062.codfw.wmnet with reason: host reimage
00:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T391056)', diff saved to https://phabricator.wikimedia.org/P74774 and previous config saved to /var/cache/conftool/dbconfig/20250409-004717-fceratto.json
00:46 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2062.codfw.wmnet with reason: host reimage
00:34 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T391056)', diff saved to https://phabricator.wikimedia.org/P74773 and previous config saved to /var/cache/conftool/dbconfig/20250409-003434-fceratto.json
00:34 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2163.codfw.wmnet with reason: Maintenance
00:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T391056)', diff saved to https://phabricator.wikimedia.org/P74772 and previous config saved to /var/cache/conftool/dbconfig/20250409-003412-fceratto.json
00:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2062
00:29 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2062
00:29 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2062
00:29 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2062.codfw.wmnet 144.0.192.10.in-addr.arpa 4.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
00:29 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2062.codfw.wmnet 144.0.192.10.in-addr.arpa 4.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
00:29 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:29 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2062 - bking@cumin2002"
00:29 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2062 - bking@cumin2002"
00:25 bking@cumin2002: START - Cookbook sre.dns.netbox
00:25 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2062
00:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2062.codfw.wmnet with OS bullseye
00:24 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2062.codfw.wmnet on all recursors
00:24 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2062.codfw.wmnet on all recursors
00:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2062 to cirrussearch2062
00:23 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2062
00:23 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2062
00:23 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:23 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2062 to cirrussearch2062 - bking@cumin2002"
00:23 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2062 to cirrussearch2062 - bking@cumin2002"
00:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P74771 and previous config saved to /var/cache/conftool/dbconfig/20250409-001905-fceratto.json
00:17 bking@cumin2002: START - Cookbook sre.dns.netbox
00:17 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2062 to cirrussearch2062
00:14 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
00:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P74770 and previous config saved to /var/cache/conftool/dbconfig/20250409-000358-fceratto.json

2025-04-09

12:03 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2052.codfw.wmnet
12:03 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1052.eqiad.wmnet

2025-04-08

23:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
23:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2069.codfw.wmnet with OS bullseye
23:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T391056)', diff saved to https://phabricator.wikimedia.org/P74769 and previous config saved to /var/cache/conftool/dbconfig/20250408-234850-fceratto.json
23:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2069.codfw.wmnet with reason: host reimage
23:36 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T391056)', diff saved to https://phabricator.wikimedia.org/P74768 and previous config saved to /var/cache/conftool/dbconfig/20250408-233611-fceratto.json
23:36 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2162.codfw.wmnet with reason: Maintenance
23:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T391056)', diff saved to https://phabricator.wikimedia.org/P74767 and previous config saved to /var/cache/conftool/dbconfig/20250408-233549-fceratto.json
23:35 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2069.codfw.wmnet with reason: host reimage
23:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2087.codfw.wmnet with OS bullseye
23:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P74766 and previous config saved to /var/cache/conftool/dbconfig/20250408-232042-fceratto.json
23:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P74765 and previous config saved to /var/cache/conftool/dbconfig/20250408-230535-fceratto.json
23:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2069
23:02 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2069
23:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2087.codfw.wmnet with reason: host reimage
23:02 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2069
23:02 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2069.codfw.wmnet 142.0.192.10.in-addr.arpa 2.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
23:02 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2069.codfw.wmnet 142.0.192.10.in-addr.arpa 2.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
23:02 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:02 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2069 - bking@cumin2002"
23:02 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2069 - bking@cumin2002"
22:56 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2087.codfw.wmnet with reason: host reimage
22:56 bking@cumin2002: START - Cookbook sre.dns.netbox
22:56 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2069
22:56 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2069.codfw.wmnet with OS bullseye
22:55 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2069.codfw.wmnet on all recursors
22:55 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2069.codfw.wmnet on all recursors
22:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2069 to cirrussearch2069
22:54 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2069
22:54 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2069
22:54 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:54 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2069 to cirrussearch2069 - bking@cumin2002"
22:53 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2069 to cirrussearch2069 - bking@cumin2002"
22:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T391056)', diff saved to https://phabricator.wikimedia.org/P74764 and previous config saved to /var/cache/conftool/dbconfig/20250408-225028-fceratto.json
22:49 bking@cumin2002: START - Cookbook sre.dns.netbox
22:49 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2069 to cirrussearch2069
22:48 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
22:40 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
22:40 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
22:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
22:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
22:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
22:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
22:39 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2087
22:39 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2087
22:39 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2087
22:39 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2087.codfw.wmnet 90.0.192.10.in-addr.arpa 0.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
22:39 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2087.codfw.wmnet 90.0.192.10.in-addr.arpa 0.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
22:39 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:39 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2087 - bking@cumin2002"
22:38 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2087 - bking@cumin2002"
22:37 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T391056)', diff saved to https://phabricator.wikimedia.org/P74763 and previous config saved to /var/cache/conftool/dbconfig/20250408-223744-fceratto.json
22:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2154.codfw.wmnet with reason: Maintenance
22:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T391056)', diff saved to https://phabricator.wikimedia.org/P74762 and previous config saved to /var/cache/conftool/dbconfig/20250408-223721-fceratto.json
22:34 bking@cumin2002: START - Cookbook sre.dns.netbox
22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2047 to codfw - jhancock@cumin2002"
22:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2047 to codfw - jhancock@cumin2002"
22:30 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2087
22:30 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2087.codfw.wmnet with OS bullseye
22:29 jhancock@cumin2002: START - Cookbook sre.dns.netbox
22:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2087.codfw.wmnet on all recursors
22:28 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2087.codfw.wmnet on all recursors
22:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P74761 and previous config saved to /var/cache/conftool/dbconfig/20250408-222213-fceratto.json
22:12 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
22:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P74760 and previous config saved to /var/cache/conftool/dbconfig/20250408-220706-fceratto.json
22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2087 to cirrussearch2087
22:04 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2087
22:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2087
22:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2087 to cirrussearch2087 - bking@cumin2002"
22:03 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2087 to cirrussearch2087 - bking@cumin2002"
22:02 ryankemper: T388610 Elasticsearch->Opensearch row a data node migration ongoing
21:58 bking@cumin2002: START - Cookbook sre.dns.netbox
21:58 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2087 to cirrussearch2087
21:57 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
21:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T391056)', diff saved to https://phabricator.wikimedia.org/P74759 and previous config saved to /var/cache/conftool/dbconfig/20250408-215159-fceratto.json
21:40 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T391056)', diff saved to https://phabricator.wikimedia.org/P74758 and previous config saved to /var/cache/conftool/dbconfig/20250408-214049-fceratto.json
21:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2152.codfw.wmnet with reason: Maintenance
21:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for LoginSignupSpecialPage: Get a login token before persisting the session (T390514), LoginSignupSpecialPage: Get a login token before persisting the session (T390514), [BETA CLUSTER] Decommission Beta Wikifunctions (T362200 T363397 T368161 T373464 T389274) (duration: 15m 42s)
21:32 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
21:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
21:31 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
21:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1257 (T391056)', diff saved to https://phabricator.wikimedia.org/P74757 and previous config saved to /var/cache/conftool/dbconfig/20250408-213136-fceratto.json
21:30 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
21:30 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
21:29 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
21:27 ladsgroup@deploy1003: ladsgroup, jforrester: Continuing with sync
21:25 ladsgroup@deploy1003: ladsgroup, jforrester: Backport for LoginSignupSpecialPage: Get a login token before persisting the session (T390514), LoginSignupSpecialPage: Get a login token before persisting the session (T390514), [BETA CLUSTER] Decommission Beta Wikifunctions (T362200 T363397 T368161 T373464 T389274) synced to the testservers (https://wikitech.wikimed
21:19 brett: import libvmod-netmapper 1.9-4 to component/varnish6 bullseye-wikimedia (T391334)
21:18 ladsgroup@deploy1003: Started scap sync-world: Backport for LoginSignupSpecialPage: Get a login token before persisting the session (T390514), LoginSignupSpecialPage: Get a login token before persisting the session (T390514), [BETA CLUSTER] Decommission Beta Wikifunctions (T362200 T363397 T368161 T373464 T389274)
21:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1257', diff saved to https://phabricator.wikimedia.org/P74756 and previous config saved to /var/cache/conftool/dbconfig/20250408-211629-fceratto.json
21:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1257', diff saved to https://phabricator.wikimedia.org/P74755 and previous config saved to /var/cache/conftool/dbconfig/20250408-210121-fceratto.json
20:51 brett: import libvmod-querysort 0.4-2 to component/varnish6 bullseye-wikimedia (T391334)
20:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1257 (T391056)', diff saved to https://phabricator.wikimedia.org/P74754 and previous config saved to /var/cache/conftool/dbconfig/20250408-204615-fceratto.json
20:37 brett: import varnish-modules 0.15.0-3 to component/varnish6 bullseye-wikimedia (T391334)
20:36 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1257 (T391056)', diff saved to https://phabricator.wikimedia.org/P74753 and previous config saved to /var/cache/conftool/dbconfig/20250408-203618-fceratto.json
20:36 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1257.eqiad.wmnet with reason: Maintenance
20:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1256.eqiad.wmnet with reason: Maintenance
20:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for ArticleFooterEntrypointCard: Fix display of entrypoint (T389176), ArticleFooterEntrypointCard: Fix display of entrypoint (T389176) (duration: 14m 16s)
20:25 brett: import varnishkafka 1.1.0-4 to component/varnish6 bullseyw-wikimedia (T391334)
20:22 brett: import libvmod-re2 1.5.3-4 to component/varnish6 bullseyw-wikimedia (T391334)
20:19 ladsgroup@deploy1003: abi, ladsgroup: Continuing with sync
20:19 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1255.eqiad.wmnet with reason: Maintenance
20:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T391056)', diff saved to https://phabricator.wikimedia.org/P74752 and previous config saved to /var/cache/conftool/dbconfig/20250408-201845-fceratto.json
20:17 ladsgroup@deploy1003: abi, ladsgroup: Backport for ArticleFooterEntrypointCard: Fix display of entrypoint (T389176), ArticleFooterEntrypointCard: Fix display of entrypoint (T389176) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:12 ladsgroup@deploy1003: Started scap sync-world: Backport for ArticleFooterEntrypointCard: Fix display of entrypoint (T389176), ArticleFooterEntrypointCard: Fix display of entrypoint (T389176)
20:12 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host releases2003.codfw.wmnet with OS bookworm
20:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P74751 and previous config saved to /var/cache/conftool/dbconfig/20250408-200338-fceratto.json
19:56 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage
19:53 aokoth@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage
19:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P74750 and previous config saved to /var/cache/conftool/dbconfig/20250408-194831-fceratto.json
19:33 aokoth@cumin1002: START - Cookbook sre.hosts.reimage for host releases2003.codfw.wmnet with OS bookworm
19:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T391056)', diff saved to https://phabricator.wikimedia.org/P74749 and previous config saved to /var/cache/conftool/dbconfig/20250408-193324-fceratto.json
19:21 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T391056)', diff saved to https://phabricator.wikimedia.org/P74748 and previous config saved to /var/cache/conftool/dbconfig/20250408-192147-fceratto.json
19:21 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1226.eqiad.wmnet with reason: Maintenance
19:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1216.eqiad.wmnet with reason: Maintenance
19:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T391056)', diff saved to https://phabricator.wikimedia.org/P74747 and previous config saved to /var/cache/conftool/dbconfig/20250408-191215-fceratto.json
18:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P74746 and previous config saved to /var/cache/conftool/dbconfig/20250408-185708-fceratto.json
18:46 dancy@deploy1003: Installation of scap version "4.152.0" completed for 2 hosts
18:44 dancy@deploy1003: Installing scap version "4.152.0" for 2 host(s)
18:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P74745 and previous config saved to /var/cache/conftool/dbconfig/20250408-184201-fceratto.json
18:34 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.24 refs T386219
18:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T391056)', diff saved to https://phabricator.wikimedia.org/P74743 and previous config saved to /var/cache/conftool/dbconfig/20250408-182654-fceratto.json
18:15 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T391056)', diff saved to https://phabricator.wikimedia.org/P74742 and previous config saved to /var/cache/conftool/dbconfig/20250408-181513-fceratto.json
18:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1214.eqiad.wmnet with reason: Maintenance
18:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T391056)', diff saved to https://phabricator.wikimedia.org/P74741 and previous config saved to /var/cache/conftool/dbconfig/20250408-181450-fceratto.json
18:08 brennen: 1.44.0-wmf.24 train status: no current blockers, moving to group0
18:03 brett: import varnish 6.0.13-1wm1 to component/varnish6 bullseyw-wikimedia (T391334)
17:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P74740 and previous config saved to /var/cache/conftool/dbconfig/20250408-175944-fceratto.json
17:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P74739 and previous config saved to /var/cache/conftool/dbconfig/20250408-174436-fceratto.json
17:38 swfrench@deploy1003: Finished scap sync-world: Pilot scap run using PHP 8.1 container image for maintenance scripts - T390225 (duration: 03m 19s)
17:35 swfrench@deploy1003: Started scap sync-world: Pilot scap run using PHP 8.1 container image for maintenance scripts - T390225
17:32 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply
17:30 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply
17:30 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply
17:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T391056)', diff saved to https://phabricator.wikimedia.org/P74738 and previous config saved to /var/cache/conftool/dbconfig/20250408-172929-fceratto.json
17:29 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply
17:22 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host nokiatest2001.codfw.wmnet
17:17 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T391056)', diff saved to https://phabricator.wikimedia.org/P74737 and previous config saved to /var/cache/conftool/dbconfig/20250408-171753-fceratto.json
17:17 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1211.eqiad.wmnet with reason: Maintenance
17:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T391056)', diff saved to https://phabricator.wikimedia.org/P74736 and previous config saved to /var/cache/conftool/dbconfig/20250408-171731-fceratto.json
17:15 swfrench@deploy1003: Stopping before sync operations
17:14 swfrench@deploy1003: Started scap sync-world: Pilot stop-before-sync scap run using PHP 8.1 container image for maintenance scripts - T390225
17:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P74735 and previous config saved to /var/cache/conftool/dbconfig/20250408-170224-fceratto.json
16:51 ladsgroup@deploy1003: Finished scap sync-world: Backport for Revert "Temporarily enable mobile sitenotice for fawiki" (duration: 20m 49s)
16:50 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
16:50 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
16:50 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
16:50 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
16:50 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
16:50 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
16:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P74734 and previous config saved to /var/cache/conftool/dbconfig/20250408-164717-fceratto.json
16:45 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply
16:45 ladsgroup@deploy1003: ladsgroup: Continuing with sync
16:44 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply
16:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
16:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
16:41 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
16:41 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
16:41 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
16:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
16:40 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
16:40 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
16:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
16:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
16:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
16:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
16:38 ladsgroup@deploy1003: ladsgroup: Backport for Revert "Temporarily enable mobile sitenotice for fawiki" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T391056)', diff saved to https://phabricator.wikimedia.org/P74733 and previous config saved to /var/cache/conftool/dbconfig/20250408-163210-fceratto.json
16:31 ladsgroup@deploy1003: Started scap sync-world: Backport for Revert "Temporarily enable mobile sitenotice for fawiki"
16:24 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
16:24 hnowlan: running 'ipvsadm --delete-service --tcp-service 10.2.2.26:443 && ipvsadm --delete-service --tcp-service 10.2.2.5:443' on eqiad lvs to remove videoscaler and jobrunner services
16:24 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
16:24 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
16:24 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
16:24 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
16:23 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
16:22 hnowlan: running 'ipvsadm --delete-service --tcp-service 10.2.2.26:443 && ipvsadm --delete-service --tcp-service 10.2.2.5:443' on codfw lvs to remove videoscaler and jobrunner services
16:21 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
16:21 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
16:21 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
16:21 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
16:21 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
16:20 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
16:20 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T391056)', diff saved to https://phabricator.wikimedia.org/P74732 and previous config saved to /var/cache/conftool/dbconfig/20250408-162029-fceratto.json
16:20 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1209.eqiad.wmnet with reason: Maintenance
16:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T391056)', diff saved to https://phabricator.wikimedia.org/P74731 and previous config saved to /var/cache/conftool/dbconfig/20250408-162007-fceratto.json
16:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P74730 and previous config saved to /var/cache/conftool/dbconfig/20250408-160501-fceratto.json
15:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P74729 and previous config saved to /var/cache/conftool/dbconfig/20250408-154954-fceratto.json
15:48 cmooney@cumin1002: START - Cookbook sre.hosts.dhcp for host nokiatest2001.codfw.wmnet
15:45 herron@cumin1002: dbctl commit (dc=all): 'depooling db1246', diff saved to https://phabricator.wikimedia.org/P74728 and previous config saved to /var/cache/conftool/dbconfig/20250408-154509-herron.json
15:39 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
15:37 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
15:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T391056)', diff saved to https://phabricator.wikimedia.org/P74727 and previous config saved to /var/cache/conftool/dbconfig/20250408-153446-fceratto.json
15:22 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T391056)', diff saved to https://phabricator.wikimedia.org/P74726 and previous config saved to /var/cache/conftool/dbconfig/20250408-152212-fceratto.json
15:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1203.eqiad.wmnet with reason: Maintenance
15:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T391056)', diff saved to https://phabricator.wikimedia.org/P74725 and previous config saved to /var/cache/conftool/dbconfig/20250408-152150-fceratto.json
15:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P74724 and previous config saved to /var/cache/conftool/dbconfig/20250408-150643-fceratto.json
15:03 brennen@deploy1003: Finished deploy [phabricator/deployment@99aa712]: deploy phab1004 for T391357 (duration: 00m 38s)
15:03 brennen@deploy1003: Started deploy [phabricator/deployment@99aa712]: deploy phab1004 for T391357
15:02 brennen@deploy1003: Finished deploy [phabricator/deployment@99aa712]: test deploy phab2002 for T391357 (duration: 00m 42s)
15:02 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2002.codfw.wmnet with reason: T391357
15:02 brennen@deploy1003: Started deploy [phabricator/deployment@99aa712]: test deploy phab2002 for T391357
15:01 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1004.eqiad.wmnet with reason: T391357
14:54 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host releases2003.codfw.wmnet with OS bookworm
14:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P74723 and previous config saved to /var/cache/conftool/dbconfig/20250408-145136-fceratto.json
14:36 hnowlan: restarting pybal on A:lvs-low-traffic-codfw to remove jobrunner and videoscaler
14:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T391056)', diff saved to https://phabricator.wikimedia.org/P74722 and previous config saved to /var/cache/conftool/dbconfig/20250408-143628-fceratto.json
14:31 hnowlan: restarting pybal on A:lvs-secondary-codfw
14:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T391056)', diff saved to https://phabricator.wikimedia.org/P74721 and previous config saved to /var/cache/conftool/dbconfig/20250408-142347-fceratto.json
14:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1192.eqiad.wmnet with reason: Maintenance
14:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T391056)', diff saved to https://phabricator.wikimedia.org/P74720 and previous config saved to /var/cache/conftool/dbconfig/20250408-142335-fceratto.json
14:22 hnowlan: restarting pybal on lvs1019 (low-traffic primary) to pick up removal of jobrunner and videoscaler
14:19 fnegri@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddumps1001.wikimedia.org with reason: down for maintenance
14:12 hnowlan: restarting pybal on A:lvs-secondary-eqiad to pick up removal of jobrunner and videoscaler
14:11 Lucas_WMDE: UTC afternoon backport+config window done
14:10 hnowlan: setting jobrunner and videoscaler to service_setup in puppet
14:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P74718 and previous config saved to /var/cache/conftool/dbconfig/20250408-140828-fceratto.json
14:08 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
14:07 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
14:07 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
14:06 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
14:04 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for ArticleFooterEntrypointCard: Change the way codex is loaded (T389176), ArticleFooterEntrypointCard: Change the way codex is loaded (T389176) (duration: 22m 23s)
14:02 aokoth@cumin1002: START - Cookbook sre.hosts.reimage for host releases2003.codfw.wmnet with OS bookworm
13:59 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
13:59 elukey@deploy1003: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
13:57 lucaswerkmeister-wmde@deploy1003: abi, lucaswerkmeister-wmde: Continuing with sync
13:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P74717 and previous config saved to /var/cache/conftool/dbconfig/20250408-135321-fceratto.json
13:49 lucaswerkmeister-wmde@deploy1003: abi, lucaswerkmeister-wmde: Backport for ArticleFooterEntrypointCard: Change the way codex is loaded (T389176), ArticleFooterEntrypointCard: Change the way codex is loaded (T389176) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:45 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on releases2003.codfw.wmnet with reason: Bookworm Re-image
13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for ArticleFooterEntrypointCard: Change the way codex is loaded (T389176), ArticleFooterEntrypointCard: Change the way codex is loaded (T389176)
13:38 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Increase entityAccessLimit from 400 to 500 for all wikis except commons. (T384455), Remove unused config vars (T389429), Fix EntitySchema propertyType on Test Wikidata (T371196) (duration: 15m 30s)
13:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T391056)', diff saved to https://phabricator.wikimedia.org/P74716 and previous config saved to /var/cache/conftool/dbconfig/20250408-133814-fceratto.json
13:31 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, ebernhardson, seanleong-wmde: Continuing with sync
13:30 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, ebernhardson, seanleong-wmde: Backport for Increase entityAccessLimit from 400 to 500 for all wikis except commons. (T384455), Remove unused config vars (T389429), Fix EntitySchema propertyType on Test Wikidata (T371196) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:26 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T391056)', diff saved to https://phabricator.wikimedia.org/P74715 and previous config saved to /var/cache/conftool/dbconfig/20250408-132626-fceratto.json
13:26 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1178.eqiad.wmnet with reason: Maintenance
13:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T391056)', diff saved to https://phabricator.wikimedia.org/P74714 and previous config saved to /var/cache/conftool/dbconfig/20250408-132603-fceratto.json
13:22 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Increase entityAccessLimit from 400 to 500 for all wikis except commons. (T384455), Remove unused config vars (T389429), Fix EntitySchema propertyType on Test Wikidata (T371196)
13:18 Lucas_WMDE: lucaswerkmeister-wmde@deploy1003 ~ $ mwscript-k8s --comment=T391299 --follow -- namespaceDupes ptwiktionary --fix
13:17 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [ptwiktionary] Create a Wikisaurus namespace (T391299) (duration: 15m 24s)
13:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P74712 and previous config saved to /var/cache/conftool/dbconfig/20250408-131056-fceratto.json
13:10 lucaswerkmeister-wmde@deploy1003: superpes, lucaswerkmeister-wmde: Continuing with sync
13:09 lucaswerkmeister-wmde@deploy1003: superpes, lucaswerkmeister-wmde: Backport for [ptwiktionary] Create a Wikisaurus namespace (T391299) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:08 marostegui: TEST maintenance s1 eqiad dbmaint T391346
13:02 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [ptwiktionary] Create a Wikisaurus namespace (T391299)
12:57 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2048.codfw.wmnet
12:56 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1048.eqiad.wmnet
12:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P74711 and previous config saved to /var/cache/conftool/dbconfig/20250408-125549-fceratto.json
12:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2048.codfw.wmnet
12:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1048.eqiad.wmnet
12:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T391056)', diff saved to https://phabricator.wikimedia.org/P74709 and previous config saved to /var/cache/conftool/dbconfig/20250408-124042-fceratto.json
12:35 elukey: started the rollout of xz-utils' security upgrades (gradual during the next days)
12:29 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T391056)', diff saved to https://phabricator.wikimedia.org/P74708 and previous config saved to /var/cache/conftool/dbconfig/20250408-122919-fceratto.json
12:29 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
12:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T391056)', diff saved to https://phabricator.wikimedia.org/P74707 and previous config saved to /var/cache/conftool/dbconfig/20250408-122859-fceratto.json
12:14 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
12:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P74706 and previous config saved to /var/cache/conftool/dbconfig/20250408-121352-fceratto.json
12:13 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
12:13 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
12:12 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
12:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
11:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P74705 and previous config saved to /var/cache/conftool/dbconfig/20250408-115845-fceratto.json
11:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T391056)', diff saved to https://phabricator.wikimedia.org/P74704 and previous config saved to /var/cache/conftool/dbconfig/20250408-114338-fceratto.json
11:39 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
11:39 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:31 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T391056)', diff saved to https://phabricator.wikimedia.org/P74703 and previous config saved to /var/cache/conftool/dbconfig/20250408-113154-fceratto.json
11:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1172.eqiad.wmnet with reason: Maintenance
11:30 cgoubert@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) depool for host wikikube-worker2142.codfw.wmnet
11:27 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2142.codfw.wmnet
11:21 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: Maintenance
11:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T391056)', diff saved to https://phabricator.wikimedia.org/P74702 and previous config saved to /var/cache/conftool/dbconfig/20250408-112124-fceratto.json
11:13 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 75% (T360589) (duration: 16m 35s)
11:06 ladsgroup@deploy1003: ladsgroup: Continuing with sync
11:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P74701 and previous config saved to /var/cache/conftool/dbconfig/20250408-110618-fceratto.json
11:04 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 75% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:02 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
11:02 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
11:01 mvernon@cumin2002: conftool action : set/pooled=yes; selector: name=thanos-fe2007.codfw.wmnet
11:01 mvernon@cumin2002: conftool action : set/pooled=yes; selector: name=thanos-fe2006.codfw.wmnet
11:01 mvernon@cumin2002: conftool action : set/weight=100; selector: name=thanos-fe2007.codfw.wmnet
11:01 mvernon@cumin2002: conftool action : set/weight=100; selector: name=thanos-fe2006.codfw.wmnet
11:01 mvernon@cumin2002: conftool action : set/pooled=yes; selector: name=thanos-fe2005.codfw.wmnet
11:00 mvernon@cumin2002: conftool action : set/weight=100; selector: name=thanos-fe2005.codfw.wmnet
11:00 Emperor: pool thanos-fe200[5-7] T389634
11:00 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting P{lvs3008.esams.wmnet} and A:liberica
10:59 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting P{lvs3008.esams.wmnet} and A:liberica
10:57 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 75% (T360589)
10:56 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
10:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P74700 and previous config saved to /var/cache/conftool/dbconfig/20250408-105111-fceratto.json
10:51 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
10:48 hnowlan@dns1004: END - running authdns-update
10:47 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2007.codfw.wmnet
10:45 hnowlan@dns1004: START - running authdns-update
10:41 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-fe2007.codfw.wmnet
10:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2006.codfw.wmnet
10:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T391056)', diff saved to https://phabricator.wikimedia.org/P74699 and previous config saved to /var/cache/conftool/dbconfig/20250408-103604-fceratto.json
10:34 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
10:33 jelto: restart mailman3.service on lists1004 - T391330
10:33 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
10:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-fe2006.codfw.wmnet
10:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2005.codfw.wmnet
10:25 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-fe2005.codfw.wmnet
10:24 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T391056)', diff saved to https://phabricator.wikimedia.org/P74698 and previous config saved to /var/cache/conftool/dbconfig/20250408-102412-fceratto.json
10:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
10:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1167.eqiad.wmnet with reason: Maintenance
09:57 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:42 ozge@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:08 akosiaris@dns1004: END - running authdns-update
09:05 akosiaris@dns1004: START - running authdns-update
08:29 kartik@deploy1003: Finished scap sync-world: Backport for EventStreamConfig: Add RRLA prediction_change stream (T326179) (duration: 23m 21s)
08:29 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=apus,name=apus-fe2003.codfw.wmnet
08:29 mvernon@cumin2002: conftool action : set/weight=40; selector: service=apus,name=apus-fe2003.codfw.wmnet
08:28 Emperor: pool apus-fe2003 T390578
08:22 kartik@deploy1003: kartik, kevinbazira: Continuing with sync
08:12 kartik@deploy1003: kartik, kevinbazira: Backport for EventStreamConfig: Add RRLA prediction_change stream (T326179) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repool ms1 T391317', diff saved to https://phabricator.wikimedia.org/P74695 and previous config saved to /var/cache/conftool/dbconfig/20250408-081248-marostegui.json
08:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depool ms1 T391317', diff saved to https://phabricator.wikimedia.org/P74694 and previous config saved to /var/cache/conftool/dbconfig/20250408-081224-marostegui.json
08:05 kartik@deploy1003: Started scap sync-world: Backport for EventStreamConfig: Add RRLA prediction_change stream (T326179)
08:02 kartik@deploy1003: Finished scap sync-world: Backport for AX: Enable entry-points on Tswana and Venetian wiki (T390023) (duration: 21m 33s)
07:55 kartik@deploy1003: abi, kartik: Continuing with sync
07:48 kartik@deploy1003: abi, kartik: Backport for AX: Enable entry-points on Tswana and Venetian wiki (T390023) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:41 kartik@deploy1003: Started scap sync-world: Backport for AX: Enable entry-points on Tswana and Venetian wiki (T390023)
07:35 slyngshede@dns1004: END - running authdns-update
07:34 kartik@deploy1003: Finished scap sync-world: Backport for AX: Enable Quick Surveys extension on Tswana and Venetian wiki (T390023) (duration: 20m 27s)
07:33 slyngshede@dns1004: START - running authdns-update
07:25 kartik@deploy1003: abi, kartik: Continuing with sync
07:21 kartik@deploy1003: abi, kartik: Backport for AX: Enable Quick Surveys extension on Tswana and Venetian wiki (T390023) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:13 kartik@deploy1003: Started scap sync-world: Backport for AX: Enable Quick Surveys extension on Tswana and Venetian wiki (T390023)
06:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repool ms2 T391317', diff saved to https://phabricator.wikimedia.org/P74693 and previous config saved to /var/cache/conftool/dbconfig/20250408-064813-marostegui.json
06:45 marostegui: Upgrade ms2 to MariaDB 10.11 codfw eqiad dbmaint T391317
06:43 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Maintenance
06:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depool ms2 T391317', diff saved to https://phabricator.wikimedia.org/P74692 and previous config saved to /var/cache/conftool/dbconfig/20250408-064250-marostegui.json
04:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T391056)', diff saved to https://phabricator.wikimedia.org/P74691 and previous config saved to /var/cache/conftool/dbconfig/20250408-045801-fceratto.json
04:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P74690 and previous config saved to /var/cache/conftool/dbconfig/20250408-044254-fceratto.json
04:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P74689 and previous config saved to /var/cache/conftool/dbconfig/20250408-042748-fceratto.json
04:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T391056)', diff saved to https://phabricator.wikimedia.org/P74688 and previous config saved to /var/cache/conftool/dbconfig/20250408-041241-fceratto.json
04:09 mwpresync@deploy1003: Pruned MediaWiki: 1.44.0-wmf.21 (duration: 09m 26s)
04:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2224 (T391056)', diff saved to https://phabricator.wikimedia.org/P74687 and previous config saved to /var/cache/conftool/dbconfig/20250408-040728-fceratto.json
04:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2224.codfw.wmnet with reason: Maintenance
04:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T391056)', diff saved to https://phabricator.wikimedia.org/P74686 and previous config saved to /var/cache/conftool/dbconfig/20250408-040706-fceratto.json
04:06 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.44.0-wmf.24 refs T386219 (duration: 63m 43s)
03:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P74685 and previous config saved to /var/cache/conftool/dbconfig/20250408-035159-fceratto.json
03:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P74684 and previous config saved to /var/cache/conftool/dbconfig/20250408-033652-fceratto.json
03:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T391056)', diff saved to https://phabricator.wikimedia.org/P74683 and previous config saved to /var/cache/conftool/dbconfig/20250408-032145-fceratto.json
03:16 cstone: payments-wiki upgraded from 10b6cf1d to ef9284aa
03:16 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T391056)', diff saved to https://phabricator.wikimedia.org/P74682 and previous config saved to /var/cache/conftool/dbconfig/20250408-031632-fceratto.json
03:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2217.codfw.wmnet with reason: Maintenance
03:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T391056)', diff saved to https://phabricator.wikimedia.org/P74681 and previous config saved to /var/cache/conftool/dbconfig/20250408-031609-fceratto.json
03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.44.0-wmf.24 refs T386219
03:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P74680 and previous config saved to /var/cache/conftool/dbconfig/20250408-030102-fceratto.json
02:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P74679 and previous config saved to /var/cache/conftool/dbconfig/20250408-024555-fceratto.json
02:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T391056)', diff saved to https://phabricator.wikimedia.org/P74678 and previous config saved to /var/cache/conftool/dbconfig/20250408-023047-fceratto.json
02:25 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T391056)', diff saved to https://phabricator.wikimedia.org/P74677 and previous config saved to /var/cache/conftool/dbconfig/20250408-022538-fceratto.json
02:25 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2214.codfw.wmnet with reason: Maintenance
02:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2197.codfw.wmnet with reason: Maintenance
02:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T391056)', diff saved to https://phabricator.wikimedia.org/P74676 and previous config saved to /var/cache/conftool/dbconfig/20250408-022146-fceratto.json
02:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P74675 and previous config saved to /var/cache/conftool/dbconfig/20250408-020639-fceratto.json
01:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P74674 and previous config saved to /var/cache/conftool/dbconfig/20250408-015132-fceratto.json
01:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T391056)', diff saved to https://phabricator.wikimedia.org/P74673 and previous config saved to /var/cache/conftool/dbconfig/20250408-013625-fceratto.json
01:34 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T391056)', diff saved to https://phabricator.wikimedia.org/P74672 and previous config saved to /var/cache/conftool/dbconfig/20250408-013412-fceratto.json
01:34 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2193.codfw.wmnet with reason: Maintenance
01:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T391056)', diff saved to https://phabricator.wikimedia.org/P74671 and previous config saved to /var/cache/conftool/dbconfig/20250408-013348-fceratto.json
01:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P74670 and previous config saved to /var/cache/conftool/dbconfig/20250408-011841-fceratto.json
01:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P74669 and previous config saved to /var/cache/conftool/dbconfig/20250408-010334-fceratto.json
00:48 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1202.eqiad.wmnet
00:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T391056)', diff saved to https://phabricator.wikimedia.org/P74668 and previous config saved to /var/cache/conftool/dbconfig/20250408-004827-fceratto.json
00:47 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T391056)', diff saved to https://phabricator.wikimedia.org/P74667 and previous config saved to /var/cache/conftool/dbconfig/20250408-004715-fceratto.json
00:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2180.codfw.wmnet with reason: Maintenance
00:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T391056)', diff saved to https://phabricator.wikimedia.org/P74666 and previous config saved to /var/cache/conftool/dbconfig/20250408-004652-fceratto.json
00:43 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1202.eqiad.wmnet
00:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P74665 and previous config saved to /var/cache/conftool/dbconfig/20250408-003144-fceratto.json
00:22 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1202.eqiad.wmnet
00:21 btullis@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1202.eqiad.wmnet
00:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P74664 and previous config saved to /var/cache/conftool/dbconfig/20250408-001637-fceratto.json
00:12 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1202.eqiad.wmnet with OS bullseye
00:12 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1002"
00:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T391056)', diff saved to https://phabricator.wikimedia.org/P74663 and previous config saved to /var/cache/conftool/dbconfig/20250408-000130-fceratto.json

2025-04-07

23:55 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T391056)', diff saved to https://phabricator.wikimedia.org/P74662 and previous config saved to /var/cache/conftool/dbconfig/20250407-235541-fceratto.json
23:55 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2169.codfw.wmnet with reason: Maintenance
23:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T391056)', diff saved to https://phabricator.wikimedia.org/P74661 and previous config saved to /var/cache/conftool/dbconfig/20250407-235518-fceratto.json
23:44 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1002"
23:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P74660 and previous config saved to /var/cache/conftool/dbconfig/20250407-234011-fceratto.json
23:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P74659 and previous config saved to /var/cache/conftool/dbconfig/20250407-232503-fceratto.json
23:21 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1202.eqiad.wmnet with reason: host reimage
23:18 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1202.eqiad.wmnet with reason: host reimage
23:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T391056)', diff saved to https://phabricator.wikimedia.org/P74658 and previous config saved to /var/cache/conftool/dbconfig/20250407-230956-fceratto.json
23:04 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T391056)', diff saved to https://phabricator.wikimedia.org/P74657 and previous config saved to /var/cache/conftool/dbconfig/20250407-230411-fceratto.json
23:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2187.codfw.wmnet with reason: Maintenance
23:03 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1202.eqiad.wmnet with OS bullseye
23:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2158.codfw.wmnet with reason: Maintenance
23:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T391056)', diff saved to https://phabricator.wikimedia.org/P74656 and previous config saved to /var/cache/conftool/dbconfig/20250407-230333-fceratto.json
22:57 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1171.eqiad.wmnet
22:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P74655 and previous config saved to /var/cache/conftool/dbconfig/20250407-224827-fceratto.json
22:35 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1171.eqiad.wmnet
22:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P74654 and previous config saved to /var/cache/conftool/dbconfig/20250407-223319-fceratto.json
22:32 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_magru
22:30 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1170.eqiad.wmnet
22:26 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1170.eqiad.wmnet
22:26 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1171.eqiad.wmnet
22:24 btullis@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1171.eqiad.wmnet
22:24 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1170.eqiad.wmnet
22:20 btullis@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1170.eqiad.wmnet
22:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T391056)', diff saved to https://phabricator.wikimedia.org/P74653 and previous config saved to /var/cache/conftool/dbconfig/20250407-221812-fceratto.json
22:16 ejegg: civicrm upgraded from f7beb984 to b20436a2
22:12 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T391056)', diff saved to https://phabricator.wikimedia.org/P74652 and previous config saved to /var/cache/conftool/dbconfig/20250407-221224-fceratto.json
22:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2151.codfw.wmnet with reason: Maintenance
22:12 ejegg: civicrm upgraded from 73533b73 to f7beb984
22:09 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
22:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T391056)', diff saved to https://phabricator.wikimedia.org/P74651 and previous config saved to /var/cache/conftool/dbconfig/20250407-220851-fceratto.json
21:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P74650 and previous config saved to /var/cache/conftool/dbconfig/20250407-215342-fceratto.json
21:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P74649 and previous config saved to /var/cache/conftool/dbconfig/20250407-213835-fceratto.json
21:26 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp700[3-8].magru.wmnet} and A:cp
21:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T391056)', diff saved to https://phabricator.wikimedia.org/P74647 and previous config saved to /var/cache/conftool/dbconfig/20250407-212328-fceratto.json
21:22 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T391056)', diff saved to https://phabricator.wikimedia.org/P74646 and previous config saved to /var/cache/conftool/dbconfig/20250407-212220-fceratto.json
21:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: Maintenance
21:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1225.eqiad.wmnet with reason: Maintenance
21:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T391056)', diff saved to https://phabricator.wikimedia.org/P74645 and previous config saved to /var/cache/conftool/dbconfig/20250407-211835-fceratto.json
21:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P74644 and previous config saved to /var/cache/conftool/dbconfig/20250407-210328-fceratto.json
20:55 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2056.codfw.wmnet
20:55 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2055.codfw.wmnet
20:49 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
20:49 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
20:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P74643 and previous config saved to /var/cache/conftool/dbconfig/20250407-204821-fceratto.json
20:45 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1096.eqiad.wmnet
20:39 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic1096.eqiad.wmnet
20:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T391056)', diff saved to https://phabricator.wikimedia.org/P74642 and previous config saved to /var/cache/conftool/dbconfig/20250407-203313-fceratto.json
20:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T391056)', diff saved to https://phabricator.wikimedia.org/P74641 and previous config saved to /var/cache/conftool/dbconfig/20250407-203205-fceratto.json
20:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1201.eqiad.wmnet with reason: Maintenance
20:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T391056)', diff saved to https://phabricator.wikimedia.org/P74640 and previous config saved to /var/cache/conftool/dbconfig/20250407-203142-fceratto.json
20:20 James_F: Backport window complete.
20:19 jforrester@deploy1003: Finished scap sync-world: Backport for search-redirect: Handle $_GET potential vulnerability scanning (T389019), wikifunctionswiki: Make 'native' mode the default for Maths (duration: 14m 06s)
20:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P74638 and previous config saved to /var/cache/conftool/dbconfig/20250407-201635-fceratto.json
20:12 jforrester@deploy1003: jforrester: Continuing with sync
20:09 jforrester@deploy1003: jforrester: Backport for search-redirect: Handle $_GET potential vulnerability scanning (T389019), wikifunctionswiki: Make 'native' mode the default for Maths synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:05 jforrester@deploy1003: Started scap sync-world: Backport for search-redirect: Handle $_GET potential vulnerability scanning (T389019), wikifunctionswiki: Make 'native' mode the default for Maths
20:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P74637 and previous config saved to /var/cache/conftool/dbconfig/20250407-200128-fceratto.json
19:48 urandom: extending vg0/srv logical volume, sessionstore100[4-6].eqiad.wmnet — T390514
19:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T391056)', diff saved to https://phabricator.wikimedia.org/P74636 and previous config saved to /var/cache/conftool/dbconfig/20250407-194621-fceratto.json
19:44 urandom: extending vg0/srv logical volume, sesionstore2006 — T390514
19:44 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T391056)', diff saved to https://phabricator.wikimedia.org/P74635 and previous config saved to /var/cache/conftool/dbconfig/20250407-194412-fceratto.json
19:44 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1187.eqiad.wmnet with reason: Maintenance
19:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T391056)', diff saved to https://phabricator.wikimedia.org/P74634 and previous config saved to /var/cache/conftool/dbconfig/20250407-194350-fceratto.json
19:41 urandom: extending vg0/srv logical volume, sesionstore2005 — T390514
19:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P74633 and previous config saved to /var/cache/conftool/dbconfig/20250407-192842-fceratto.json
19:19 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1096* for ban node to stop high rejection rates - bking@cumin2002
19:19 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1096* for ban node to stop high rejection rates - bking@cumin2002
19:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P74632 and previous config saved to /var/cache/conftool/dbconfig/20250407-191335-fceratto.json
19:12 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1202.eqiad.wmnet with OS bullseye
19:06 urandom: extending vg0/srv logical volume, sesionstore2004 — T390514
18:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T391056)', diff saved to https://phabricator.wikimedia.org/P74631 and previous config saved to /var/cache/conftool/dbconfig/20250407-185828-fceratto.json
18:56 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T391056)', diff saved to https://phabricator.wikimedia.org/P74630 and previous config saved to /var/cache/conftool/dbconfig/20250407-185619-fceratto.json
18:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1180.eqiad.wmnet with reason: Maintenance
18:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T391056)', diff saved to https://phabricator.wikimedia.org/P74629 and previous config saved to /var/cache/conftool/dbconfig/20250407-185556-fceratto.json
18:55 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_magru
18:49 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp700[3-8].magru.wmnet} and A:cp
18:48 wfan: payments-wiki upgraded from 646f47bf to 10b6cf1d
18:43 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7002.magru.wmnet
18:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P74628 and previous config saved to /var/cache/conftool/dbconfig/20250407-184049-fceratto.json
18:32 dancy@deploy1003: Finished scap sync-world: testing (duration: 05m 35s)
18:30 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
18:26 dancy@deploy1003: Started scap sync-world: testing
18:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P74627 and previous config saved to /var/cache/conftool/dbconfig/20250407-182542-fceratto.json
18:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T391056)', diff saved to https://phabricator.wikimedia.org/P74625 and previous config saved to /var/cache/conftool/dbconfig/20250407-181035-fceratto.json
18:09 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1202.eqiad.wmnet with OS bullseye
18:09 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T391056)', diff saved to https://phabricator.wikimedia.org/P74624 and previous config saved to /var/cache/conftool/dbconfig/20250407-180927-fceratto.json
18:09 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1168.eqiad.wmnet with reason: Maintenance
18:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T391056)', diff saved to https://phabricator.wikimedia.org/P74623 and previous config saved to /var/cache/conftool/dbconfig/20250407-180905-fceratto.json
18:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1202.eqiad.wmnet with OS bullseye
17:59 brett: Upload varnishkafka 1.2.0-2 to bullseye-wikimedia (T389605)
17:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P74622 and previous config saved to /var/cache/conftool/dbconfig/20250407-175358-fceratto.json
17:50 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Xiaoxiao out of all services on: 2397 hosts
17:44 brett: Remove libvmod-netmapper, libvmod-querysort, varnish-re2, varnish, varnishkafka, varnish-modules from bullseye-wikimedia component/varnish-staging
17:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P74621 and previous config saved to /var/cache/conftool/dbconfig/20250407-173851-fceratto.json
17:27 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7002.magru.wmnet
17:26 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
17:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T391056)', diff saved to https://phabricator.wikimedia.org/P74620 and previous config saved to /var/cache/conftool/dbconfig/20250407-172343-fceratto.json
17:22 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T391056)', diff saved to https://phabricator.wikimedia.org/P74619 and previous config saved to /var/cache/conftool/dbconfig/20250407-172234-fceratto.json
17:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
17:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1165.eqiad.wmnet with reason: Maintenance
17:17 brett: Re-enabling Puppet on A:cp (T378737)
17:04 brett: Disabling puppet on A:cp to roll out removal of vanrish 6/7 template switching (T378737)
17:04 dancy@deploy1003: Installation of scap version "4.151.0" completed for 190 hosts
16:59 dancy@deploy1003: Installing scap version "4.151.0" for 190 host(s)
16:54 slyngshede@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Xiaoxiao out of all services on: 2396 hosts
16:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1202.eqiad.wmnet with OS bullseye
16:52 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1202.eqiad.wmnet with OS bullseye
16:33 brett: Upload ncmonitor 1.3.4-1 to bookworm-wikimedia
16:30 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: relforge1003* for test ban syntax - bking@cumin2002 - T391151
16:30 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: relforge1003* for test ban syntax - bking@cumin2002 - T391151
16:29 mforns@deploy1003: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
16:29 mforns@deploy1003: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
16:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1202.eqiad.wmnet with OS bullseye
16:24 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-worker1202.eqiad.wmnet on all recursors
16:24 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache an-worker1202.eqiad.wmnet on all recursors
16:23 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-worker1202.eqiad.wmnet on all recursors
16:23 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache an-worker1202.eqiad.wmnet on all recursors
16:17 mforns@deploy1003: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
16:17 mforns@deploy1003: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
16:15 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1202
16:15 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1202
16:10 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:08 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: relforge1004* for test ban syntax - bking@cumin2002 - T391151
16:08 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: relforge1004* for test ban syntax - bking@cumin2002 - T391151
16:07 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in relforge
16:07 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in relforge
15:59 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:58 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:56 jclark@cumin1002: START - Cookbook sre.dns.netbox
15:49 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1202
15:49 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1202
15:44 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1202
15:44 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1202
15:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:40 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:28 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1202
15:28 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1202
15:25 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:23 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1202
15:23 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1202
15:22 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for an-worker1202 - jclark@cumin1002"
15:22 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for an-worker1202 - jclark@cumin1002"
15:21 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2016.codfw.wmnet
15:21 mvernon@cumin1002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe2016.codfw.wmnet
15:21 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe2016.codfw.wmnet
15:21 mvernon@cumin1002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe2016.codfw.wmnet
15:21 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2015.codfw.wmnet
15:21 mvernon@cumin1002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe2015.codfw.wmnet
15:21 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe2015.codfw.wmnet
15:21 mvernon@cumin1002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe2015.codfw.wmnet
15:21 Emperor: pool ms-fe2015 ms-fe2016 T388887
15:20 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
15:18 jclark@cumin1002: START - Cookbook sre.dns.netbox
15:16 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
15:11 elukey@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
15:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2016.codfw.wmnet
15:07 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2015.codfw.wmnet
15:04 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:04 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:03 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2016.codfw.wmnet
15:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:02 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:01 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-fe2015.codfw.wmnet
15:01 elukey@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
15:01 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1202
15:01 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1202
14:59 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1202
14:59 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1202
14:55 urandom: enabling unchecked_tombstone_compaction on sessionstore Cassandra — T390514
14:31 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1169
14:31 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1169
14:29 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cirrussearch[2055-2056].codfw.wmnet with reason: adding net-new role
14:12 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:09 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
14:09 jclark@cumin1002: START - Cookbook sre.dns.netbox
14:09 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
14:09 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
14:09 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
14:07 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
14:07 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
14:07 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1047.eqiad.wmnet
14:01 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
14:01 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1047.eqiad.wmnet
14:00 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
14:00 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
13:59 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
13:54 James_F: Backport window complete.
13:52 jforrester@deploy1003: Finished scap sync-world: Backport for Improve GeoCrumbs fallback when page property is not (yet) set (T391128) (duration: 13m 25s)
13:44 jforrester@deploy1003: jforrester, cscott: Continuing with sync
13:44 fabfur: deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1134689 on A:cp-esams (T384227)
13:44 jforrester@deploy1003: jforrester, cscott: Backport for Improve GeoCrumbs fallback when page property is not (yet) set (T391128) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:43 fabfur: disable puppet on A:cp-esams to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1134689 (T384227)
13:38 jforrester@deploy1003: Started scap sync-world: Backport for Improve GeoCrumbs fallback when page property is not (yet) set (T391128)
13:38 jforrester@deploy1003: Sync cancelled.
13:36 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:36 jforrester@deploy1003: jforrester, cscott: Backport for Improve GeoCrumbs fallback when page property is not (yet) set (T391128) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:36 taavi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudsw-b1.private.codfw.wikimedia.cloud on codfw recursors
13:36 taavi@cumin1002: START - Cookbook sre.dns.wipe-cache cloudsw-b1.private.codfw.wikimedia.cloud on codfw recursors
13:36 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.9-1wm1_amd64.changes: T390912
13:35 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:34 sukhe: sudo -i reprepro remove bullseye-wikimedia trafficserver: T390912
13:34 sukhe: sudo -i reprepro remove bullseye-wikimedia trafficserver
13:32 sukhe: depool cp4037: reverting to ATS 9.2.9
13:30 jforrester@deploy1003: Started scap sync-world: Backport for Improve GeoCrumbs fallback when page property is not (yet) set (T391128)
13:26 jforrester@deploy1003: Finished scap sync-world: Backport for Shift to Parsoid Fragment support v3 (T390420), Where Parsoid Read Views are the default, use it for MFE as well (T376048 T374578) (duration: 20m 54s)
13:24 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4037.*} and A:cp for 9.2.10-1wm1
13:22 sukhe: P{cp4037.*} and A:cp for 9.2.10-1wm1 T390912
13:21 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4037.*} and A:cp for 9.2.10-1wm1
13:20 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
13:20 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
13:20 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
13:19 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
13:19 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
13:19 jelto@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
13:18 jforrester@deploy1003: jforrester, cscott: Continuing with sync
13:14 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2047.codfw.wmnet
13:13 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1046.eqiad.wmnet
13:10 jforrester@deploy1003: jforrester, cscott: Backport for Shift to Parsoid Fragment support v3 (T390420), Where Parsoid Read Views are the default, use it for MFE as well (T376048 T374578) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:07 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1046.eqiad.wmnet
13:05 jforrester@deploy1003: Started scap sync-world: Backport for Shift to Parsoid Fragment support v3 (T390420), Where Parsoid Read Views are the default, use it for MFE as well (T376048 T374578)
12:56 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2046.codfw.wmnet
12:55 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1045.eqiad.wmnet
12:49 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2046.codfw.wmnet
12:49 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1045.eqiad.wmnet
12:48 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
12:47 jelto@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
12:35 topranks: cloudsw1-d5-eqiad: add routes for WMCS OpenStack IPv6 aggregate to cloudgw VIP T389958
12:32 topranks: cloudsw1-c8-eqiad: add routes for WMCS OpenStack IPv6 aggregate to cloudgw VIP T389958
11:57 btullis@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on an-worker1169.eqiad.wmnet with reason: Moving to rack F8
11:38 topranks: enable EBGP between cr2-eqiad and cloudsw1-d5-eqiad (IPv6 / cloud vrf) T389958
11:25 topranks: enable EBGP between cr1-eqiad and cloudsw1-c8-eqiad (IPv6 / cloud vrf) T389958
11:00 ladsgroup@deploy1003: Finished scap sync-world: Backport for Revert "Take 2: Large math formulae should be scrollable" (T201233) (duration: 13m 12s)
11:00 fabfur: deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1134648 on A:cp-eqiad (T384227)
10:58 fabfur: disable puppet on A:cp-eqiad to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1134648 (T384227)
10:53 ladsgroup@deploy1003: jdlrobson, ladsgroup: Continuing with sync
10:53 ladsgroup@deploy1003: jdlrobson, ladsgroup: Backport for Revert "Take 2: Large math formulae should be scrollable" (T201233) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:50 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
10:50 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for Revert "Take 2: Large math formulae should be scrollable" (T201233)
10:46 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 70% (T360589) (duration: 14m 22s)
10:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
10:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
10:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
10:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
10:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
10:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
10:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
10:42 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
10:42 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
10:42 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
10:41 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
10:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
10:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
10:39 ladsgroup@deploy1003: ladsgroup: Continuing with sync
10:39 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
10:37 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 70% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:32 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 70% (T360589)
10:13 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
10:12 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
09:57 daniel@deploy1003: Finished scap sync-world: Backport for [pswiki] Change the logo and wordmark/tagline (T360851), [tawiki] Enable translator usergroup and only allows translator to use ContentTranslation (T391171) (duration: 18m 31s)
09:49 daniel@deploy1003: superpes, daniel: Continuing with sync
09:44 daniel@deploy1003: superpes, daniel: Backport for [pswiki] Change the logo and wordmark/tagline (T360851), [tawiki] Enable translator usergroup and only allows translator to use ContentTranslation (T391171) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:39 daniel@deploy1003: Started scap sync-world: Backport for [pswiki] Change the logo and wordmark/tagline (T360851), [tawiki] Enable translator usergroup and only allows translator to use ContentTranslation (T391171)
09:22 fabfur: deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1134630 on A:cp-drmrs (T384227)
09:18 fabfur: disable puppet on A:cp-drmrs to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1134630 (T384227)
09:13 daniel@deploy1003: Finished scap sync-world: Backport for EventIngress: use getDeletedPage instead of getPageStateBefore (T388588 T391051) (duration: 19m 43s)
09:12 slyngshede@dns1004: END - running authdns-update
09:09 slyngshede@dns1004: START - running authdns-update
09:03 daniel@deploy1003: daniel: Continuing with sync
09:00 XioNoX: push pfw policies - T390908
08:58 daniel@deploy1003: daniel: Backport for EventIngress: use getDeletedPage instead of getPageStateBefore (T388588 T391051) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:53 daniel@deploy1003: Started scap sync-world: Backport for EventIngress: use getDeletedPage instead of getPageStateBefore (T388588 T391051)
08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host atlas1001.wikimedia.org
08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas1001.wikimedia.org - ayounsi@cumin1002"
08:44 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas1001.wikimedia.org - ayounsi@cumin1002"
08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) atlas1001.wikimedia.org on all recursors
08:44 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache atlas1001.wikimedia.org on all recursors
08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas1001.wikimedia.org - ayounsi@cumin1002"
08:43 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas1001.wikimedia.org - ayounsi@cumin1002"
08:39 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
08:39 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host atlas1001.wikimedia.org
08:12 daniel@deploy1003: Started scap sync-world: Backport for EventIngress: use getDeletedPage instead of getPageStateBefore (T388588 T391051)
08:10 fabfur: deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1133897 on A:cp-codfw (T384227)
08:10 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2160,2234].codfw.wmnet with reason: Maintenance
08:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2230.codfw.wmnet,db1176.eqiad.wmnet with reason: Maintenance
08:06 fabfur: disable puppet on A:cp-codfw to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1133897 (T384227)
07:45 dcausse: T391122: reconciled 14 wikidata items (lost EventBus/eventgate events)
06:40 daniel@deploy1003: Started scap sync-world: Backport for EventIngress: use getDeletedPage instead of getPageStateBefore (T388588 T391051)

2025-04-04

21:18 inflatador: bking@apt1002 publish-wmf-opensearch-search-plugins_1.3.20-4 to component/opensearch13 bullseye-wikimedia 1134285
20:22 urandom: starting `nodetool garbage collect -j 2`, sessionstore Cassandra
19:03 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
19:03 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
18:57 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
18:56 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
18:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2045.codfw.wmnet with OS bookworm
18:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2046.codfw.wmnet with OS bookworm
18:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2045.codfw.wmnet with OS bookworm
17:12 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:09 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:09 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:04 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:03 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:01 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:00 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:57 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:57 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:49 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:49 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:45 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:45 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:35 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:22 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:22 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:49 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:48 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:42 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:41 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:11 tchin@deploy1003: Finished deploy [airflow-dags/analytics@bece0a7]: (no justification provided) (duration: 00m 34s)
15:11 tchin@deploy1003: Started deploy [airflow-dags/analytics@bece0a7]: (no justification provided)
15:05 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
15:04 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
15:03 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
15:03 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:03 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
15:03 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:00 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:59 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:55 tchin@deploy1003: Finished deploy [analytics/refinery@c4ab9ef] (thin): THIN [analytics/refinery@c4ab9efd] (duration: 00m 59s)
14:54 tchin@deploy1003: Started deploy [analytics/refinery@c4ab9ef] (thin): THIN [analytics/refinery@c4ab9efd]
14:53 tchin@deploy1003: Finished deploy [analytics/refinery@c4ab9ef]: [analytics/refinery@c4ab9efd] (duration: 02m 54s)
14:50 tchin@deploy1003: Started deploy [analytics/refinery@c4ab9ef]: [analytics/refinery@c4ab9efd]
14:49 tchin@deploy1003: Finished deploy [analytics/refinery@c4ab9ef] (hadoop-test): TEST [analytics/refinery@c4ab9efd] (duration: 03m 01s)
14:46 tchin@deploy1003: Started deploy [analytics/refinery@c4ab9ef] (hadoop-test): TEST [analytics/refinery@c4ab9efd]
14:45 tchin: Deploying refinery for T389162
14:43 claime: Extending root vg on mwmaint1002 by 20GB
13:11 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:10 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:05 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:05 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
11:01 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Add Item and CustomItem classes as properties to `$.ui.ooMenu` (T390949) (duration: 15m 04s)
10:54 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Continuing with sync
10:54 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Backport for Add Item and CustomItem classes as properties to `$.ui.ooMenu` (T390949) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:46 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Add Item and CustomItem classes as properties to `$.ui.ooMenu` (T390949)
10:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be1070.eqiad.wmnet
10:38 mvernon@cumin1002: START - Cookbook sre.hosts.remove-downtime for ms-be1070.eqiad.wmnet
10:02 Emperor: bulk-VACUUM of container dbs ms-be1070 T377827
10:02 mvernon@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be1070.eqiad.wmnet with reason: vacuum overlarge container dbs
09:57 moritzm: installing vim security updates
09:45 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
09:44 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
09:39 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
09:29 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
08:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
08:30 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
06:45 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@d6ad899]: Update artifacts for analytics_test (duration: 00m 15s)
06:45 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@d6ad899]: Update artifacts for analytics_test
06:44 aqu@deploy1003: Finished deploy [airflow-dags/analytics@d6ad899]: Update artifacts for analytics (duration: 00m 35s)
06:44 aqu@deploy1003: Started deploy [airflow-dags/analytics@d6ad899]: Update artifacts for analytics
05:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on db2186.codfw.wmnet with reason: Maintenance in sanitarium
05:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on db1154.eqiad.wmnet with reason: Maintenance in sanitarium
05:02 TimStarling: on mwmaint1002 ran cleanupBlocks.php on all wikis
00:51 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
00:41 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
00:34 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
00:24 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
00:23 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
00:14 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
00:11 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
00:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply

2025-04-03

23:45 tstarling@deploy1003: Finished scap sync-world: Backport for Enable Codex and Multiblocks in German and Italian wiki (T377121) (duration: 15m 25s)
23:38 tstarling@deploy1003: hmonroy, tstarling: Continuing with sync
23:35 tstarling@deploy1003: hmonroy, tstarling: Backport for Enable Codex and Multiblocks in German and Italian wiki (T377121) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:30 tstarling@deploy1003: Started scap sync-world: Backport for Enable Codex and Multiblocks in German and Italian wiki (T377121)
21:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2056.codfw.wmnet with OS bullseye
21:37 James_F: Backport deploy done.
21:36 jforrester@deploy1003: Finished scap sync-world: Backport for Revert "VE: Enable mobile insert menu everywhere except top 20 mobile VE wikipedias" (duration: 15m 28s)
21:29 jforrester@deploy1003: jforrester: Continuing with sync
21:29 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cirrussearch* for ban cirrus nodes to prevent replication problems - bking@cumin2002 - T388610
21:29 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cirrussearch* for ban cirrus nodes to prevent replication problems - bking@cumin2002 - T388610
21:28 jforrester@deploy1003: jforrester: Backport for Revert "VE: Enable mobile insert menu everywhere except top 20 mobile VE wikipedias" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:21 jforrester@deploy1003: Started scap sync-world: Backport for Revert "VE: Enable mobile insert menu everywhere except top 20 mobile VE wikipedias"
21:19 jforrester@deploy1003: Sync cancelled.
21:13 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2056.codfw.wmnet with reason: host reimage
21:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2056.codfw.wmnet with reason: host reimage
21:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2056
21:06 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2056
21:06 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2056.codfw.wmnet with OS bullseye
21:06 jforrester@deploy1003: esanders, jforrester: Backport for Mobile insert menu: Exclude media and signature tools (T385851), VE: Enable mobile insert menu everywhere except top 20 mobile VE wikipedias (T388604) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:00 jforrester@deploy1003: Started scap sync-world: Backport for Mobile insert menu: Exclude media and signature tools (T385851), VE: Enable mobile insert menu everywhere except top 20 mobile VE wikipedias (T388604)
20:27 jforrester@deploy1003: esanders, jforrester: Backport for wikifunctionswiki: Disable 'mathml' mode for Maths, requires RESTbase, Hide "Insert graph" tool in VE when graphs are disabled (T387501), Enable DiscussionTools visual enhancements on zhwiki (T379264), Revert "End EmailAuth enforcement group 2 test" synced to the testservers (https://wi
20:23 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.10-1wm1_amd64.changes: T379797
20:18 jforrester@deploy1003: Started scap sync-world: Backport for wikifunctionswiki: Disable 'mathml' mode for Maths, requires RESTbase, Hide "Insert graph" tool in VE when graphs are disabled (T387501), Enable DiscussionTools visual enhancements on zhwiki (T379264), Revert "End EmailAuth enforcement group 2 test"
20:13 jforrester@deploy1003: sync-world aborted: Backport for End EmailAuth enforcement group 2 test (T390662), wikifunctionswiki: Disable 'mathml' mode for Maths, requires RESTbase (duration: 00m 33s)
20:12 jforrester@deploy1003: Started scap sync-world: Backport for End EmailAuth enforcement group 2 test (T390662), wikifunctionswiki: Disable 'mathml' mode for Maths, requires RESTbase
19:34 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
19:34 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
19:33 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
19:33 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
19:32 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
19:32 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
19:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2056
19:20 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2056
19:19 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2056
19:19 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2056.codfw.wmnet 181.0.192.10.in-addr.arpa 1.8.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
19:19 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2056.codfw.wmnet 181.0.192.10.in-addr.arpa 1.8.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
19:19 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:19 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2056 - bking@cumin2002"
19:19 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2056 - bking@cumin2002"
19:17 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
19:17 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
19:15 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
19:14 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
19:14 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
19:14 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
19:13 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
19:13 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
19:13 bking@cumin2002: START - Cookbook sre.dns.netbox
19:13 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
19:13 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2056
19:13 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2056.codfw.wmnet with OS bullseye
19:13 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
19:13 akosiaris@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
19:12 akosiaris@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
19:12 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
19:11 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
19:06 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cirrussearch2055*,cirrussearch2056* for ban cirrus nodes to prevent replication problems - bking@cumin2002 - T388610
19:06 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cirrussearch2055*,cirrussearch2056* for ban cirrus nodes to prevent replication problems - bking@cumin2002 - T388610
19:02 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cirrussearch* for ban cirrus nodes to prevent replication problems - bking@cumin2002 - T388610
19:02 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cirrussearch* for ban cirrus nodes to prevent replication problems - bking@cumin2002 - T388610
18:21 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test one - bking@cumin2002 - T388610
18:20 dancy@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.23 refs T386218
18:12 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test one - bking@cumin2002 - T388610
18:08 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test one - bking@cumin2002 - T388610
18:06 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test one - bking@cumin2002 - T388610
18:04 dancy@deploy1003: Installation of scap version "4.149.0" completed for 2 hosts
18:03 dancy@deploy1003: Installing scap version "4.149.0" for 2 host(s)
17:57 reedy@deploy1003: Finished scap sync-world: Backport for Banner: More reading from primary... (T390956), CommonSettings-labs: Update BounceHandler config (duration: 17m 43s)
17:48 reedy@deploy1003: reedy: Continuing with sync
17:47 reedy@deploy1003: reedy: Backport for Banner: More reading from primary... (T390956), CommonSettings-labs: Update BounceHandler config synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:39 reedy@deploy1003: Started scap sync-world: Backport for Banner: More reading from primary... (T390956), CommonSettings-labs: Update BounceHandler config
17:38 swfrench@deploy1003: Finished scap sync-world: Deployment to pick up new PHP 8.1 production images (duration: 28m 57s)
17:32 dzahn@dns1004: END - running authdns-update
17:30 dzahn@dns1004: START - running authdns-update
17:12 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
17:11 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
17:11 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
17:11 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
17:11 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:10 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
17:10 swfrench@deploy1003: Started scap sync-world: Deployment to pick up new PHP 8.1 production images
17:02 sukhe@dns1004: END - running authdns-update
17:02 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
17:02 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
17:00 sukhe@dns1004: START - running authdns-update
16:58 reedy@deploy1003: Finished scap sync-world: Backport for Banner: While saving, do exists() against primary (T390956) (duration: 21m 33s)
16:54 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
16:54 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
16:51 reedy@deploy1003: reedy: Continuing with sync
16:44 reedy@deploy1003: reedy: Backport for Banner: While saving, do exists() against primary (T390956) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:37 reedy@deploy1003: Started scap sync-world: Backport for Banner: While saving, do exists() against primary (T390956)
16:37 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
16:37 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
16:36 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
16:36 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
16:36 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:36 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:29 reedy@deploy1003: Finished scap sync-world: Backport for Banner: Conditionally check for banner existence from primary db (T390956) (duration: 15m 13s)
16:22 hnowlan: decommissioning all but 1 eqiad jobrunner node in confctl
16:22 reedy@deploy1003: reedy: Continuing with sync
16:21 reedy@deploy1003: reedy: Backport for Banner: Conditionally check for banner existence from primary db (T390956) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:17 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
16:17 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
16:14 reedy@deploy1003: Started scap sync-world: Backport for Banner: Conditionally check for banner existence from primary db (T390956)
16:06 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1166-1168].eqiad.wmnet
16:06 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1166-1168].eqiad.wmnet
16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for Enable EmailAuth enforcement on group 2 for short test (#2) (T390662) (duration: 14m 15s)
15:58 hnowlan: running homer 'cr*eqiad*' commit for new wikikube workers
15:55 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1168.eqiad.wmnet with OS bookworm
15:53 ladsgroup@deploy1003: tgr, ladsgroup: Continuing with sync
15:52 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on elastic2056.codfw.wmnet with reason: adding net-new role
15:52 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1167.eqiad.wmnet with OS bookworm
15:52 ladsgroup@deploy1003: tgr, ladsgroup: Backport for Enable EmailAuth enforcement on group 2 for short test (#2) (T390662) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for Enable EmailAuth enforcement on group 2 for short test (#2) (T390662)
15:41 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1166.eqiad.wmnet with OS bookworm
15:40 reedy@deploy1003: Finished scap sync-world: Backport for Remove catching of db exception (T390956) (duration: 17m 28s)
15:38 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1168.eqiad.wmnet with reason: host reimage
15:34 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1167.eqiad.wmnet with reason: host reimage
15:33 reedy@deploy1003: reedy: Continuing with sync
15:32 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1168.eqiad.wmnet with reason: host reimage
15:31 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1167.eqiad.wmnet with reason: host reimage
15:30 reedy@deploy1003: reedy: Backport for Remove catching of db exception (T390956) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:24 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1166.eqiad.wmnet with reason: host reimage
15:22 reedy@deploy1003: Started scap sync-world: Backport for Remove catching of db exception (T390956)
15:21 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1166.eqiad.wmnet with reason: host reimage
15:17 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1168
15:17 hnowlan@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1168
15:17 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1168.eqiad.wmnet with OS bookworm
15:16 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1167
15:16 hnowlan@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1167
15:16 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1167.eqiad.wmnet with OS bookworm
15:16 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1166.eqiad.wmnet wikikube-worker1167.eqiad.wmnet wikikube-worker1168.eqiad.wmnet on all recursors
15:16 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1166.eqiad.wmnet wikikube-worker1167.eqiad.wmnet wikikube-worker1168.eqiad.wmnet on all recursors
15:15 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
15:14 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1438 to wikikube-worker1168
15:14 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1168
15:14 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
15:13 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1168
15:13 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:13 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1438 to wikikube-worker1168 - hnowlan@cumin1002"
15:13 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1438 to wikikube-worker1168 - hnowlan@cumin1002"
15:10 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1437 to wikikube-worker1167
15:10 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1167
15:10 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
15:09 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
15:09 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:09 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
15:09 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1167
15:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1437 to wikikube-worker1167 - hnowlan@cumin1002"
15:08 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:08 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1437 to wikikube-worker1167 - hnowlan@cumin1002"
15:06 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1166
15:06 hnowlan@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1166
15:06 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1166.eqiad.wmnet with OS bookworm
15:04 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1438 to wikikube-worker1168
15:03 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
15:03 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1437 to wikikube-worker1167
14:49 tgr@deploy1003: Finished scap sync-world: Backport for Enable EmailAuth enforcement on group 2 for short test (T390662) (duration: 16m 18s)
14:42 tgr@deploy1003: tgr: Continuing with sync
14:42 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic2056* for ban node before reimaging - bking@cumin2002 - T388610
14:42 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2056* for ban node before reimaging - bking@cumin2002 - T388610
14:42 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic2056 for ban node before reimaging - bking@cumin2002 - T388610
14:42 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2056 for ban node before reimaging - bking@cumin2002 - T388610
14:39 tgr@deploy1003: tgr: Backport for Enable EmailAuth enforcement on group 2 for short test (T390662) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:33 tgr@deploy1003: Started scap sync-world: Backport for Enable EmailAuth enforcement on group 2 for short test (T390662)
14:27 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test one - bking@cumin2002 - T388610
14:22 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test one - bking@cumin2002 - T388610
14:18 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
14:17 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
14:12 taavi@deploy1003: Finished scap sync-world: re-syncing 1133581 (duration: 08m 58s)
14:05 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1420 to wikikube-worker1166
14:05 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1166
14:03 taavi@deploy1003: Started scap sync-world: re-syncing 1133581
14:03 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
14:03 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
14:02 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1166
14:02 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:02 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1420 to wikikube-worker1166 - hnowlan@cumin1002"
14:02 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1420 to wikikube-worker1166 - hnowlan@cumin1002"
13:57 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2045.codfw.wmnet
13:56 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1044.eqiad.wmnet
13:55 taavi@deploy1003: scap failed: <CalledProcessError> Command '['helmfile', '-e', 'eqiad', '--selector', 'name=main', 'write-values', '--output-file-template', '/tmp/tmp1ws3xaaw']' returned non-zero exit status 1. (scap version: 4.148.0) (duration: 16m 20s)
13:54 taavi@deploy1003: cscott, taavi: Continuing with sync
13:51 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
13:51 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1420 to wikikube-worker1166
13:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1044.eqiad.wmnet
13:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2045.codfw.wmnet
13:46 taavi@deploy1003: cscott, taavi: Backport for Parsoid Fragment Support v3: make mStripExtTags a persistent Parser property (T390420) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:45 moritzm: imported imposm3 0.14.1-1 to apt.wikimedia.org for bookworm-wikimedia T389780 T381565
13:39 taavi@deploy1003: Started scap sync-world: Backport for Parsoid Fragment Support v3: make mStripExtTags a persistent Parser property (T390420)
13:38 taavi: install1004: kill a dead `/usr/bin/apt-mark showmanual` process holding puppet runs
13:34 taavi@deploy1003: scap failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.44.0-wmf.22,1.44.0-wmf.23 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discovery.wmnet/
13:32 taavi@deploy1003: Started scap sync-world: Backport for Parsoid Fragment Support v3: make mStripExtTags a persistent Parser property (T390420)
13:30 taavi@deploy1003: scap failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.44.0-wmf.22,1.44.0-wmf.23 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discovery.wmnet/
13:28 taavi@deploy1003: Started scap sync-world: Backport for Parsoid Fragment Support v3: make mStripExtTags a persistent Parser property (T390420)
13:28 akosiaris@dns1004: END - running authdns-update
13:27 taavi@deploy1003: Finished scap sync-world: Backport for Enable Parsoid Read Views on 13 wiktionaries (T390680), Enable Parsoid Read Views to incubator and dagwiki mobile frontend (T380768 T381002) (duration: 19m 40s)
13:25 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
13:25 akosiaris@dns1004: START - running authdns-update
13:25 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
13:20 taavi@deploy1003: ihurbain, taavi: Continuing with sync
13:17 taavi@deploy1003: ihurbain, taavi: Backport for Enable Parsoid Read Views on 13 wiktionaries (T390680), Enable Parsoid Read Views to incubator and dagwiki mobile frontend (T380768 T381002) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:07 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
13:07 taavi@deploy1003: Started scap sync-world: Backport for Enable Parsoid Read Views on 13 wiktionaries (T390680), Enable Parsoid Read Views to incubator and dagwiki mobile frontend (T380768 T381002)
13:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:05 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-eqiad
13:04 jmm@cumin2002: END (FAIL) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=1) rolling restart_daemons on A:thanos-fe
13:02 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
12:56 moritzm: prune now obsolete nginx packages from testreduce1002 T329529
12:55 godog: move k8s instances from prometheus1006 to prometheus1008 - T383232
12:55 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
12:54 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
12:53 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
12:53 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
12:48 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
12:47 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
12:42 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all
12:28 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
12:25 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
12:24 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
12:22 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-test
12:21 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-test
12:16 moritzm: installing libxslt security updates
11:58 moritzm: installing Intel microcode security updates
11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
11:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
11:46 moritzm: installing Django security updates on Bullseye
11:37 moritzm: installing Python 3.9 security updates
11:33 topranks: reboot cr2-eqord to complete JunOS upgrade T364092
11:31 topranks: disable EBGP sessions to internet peers on cr2-eqord to prep for JunOS upgrade T364092
11:30 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr2-codfw,cr2-eqiad,cr2-eqord,cr2-eqord IPv6,cr3-ulsfo with reason: Upgrade cr2-eqord JunOS
11:07 moritzm: installing nodejs security updates
11:06 topranks: pre-pend as paths announced to codfw/eqiad from eqord to prep for JunOS upgrade T364092
11:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 65% (T360589) (duration: 16m 34s)
10:55 ladsgroup@deploy1003: ladsgroup: Continuing with sync
10:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apus-fe2003.codfw.wmnet with OS bookworm
10:54 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
10:53 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 65% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:51 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
10:50 topranks: drain transport circuits to eqord (Chicago network pop) to prep for Junos upgrade cr2-eqord T364092
10:48 moritzm: remove nodejs from aqs* hosts, no longer used/needed and spares us needless security rollouts T350143
10:46 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 65% (T360589)
10:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apus-fe2003.codfw.wmnet with reason: host reimage
10:27 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on apus-fe2003.codfw.wmnet with reason: host reimage
10:22 akosiaris@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
10:22 akosiaris@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
10:22 akosiaris@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:22 akosiaris@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
10:21 akosiaris@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
10:21 akosiaris@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
10:20 akosiaris@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
10:20 akosiaris@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
10:18 akosiaris@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:18 akosiaris@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:17 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
10:17 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
10:17 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
10:17 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
10:16 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
10:16 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
10:14 akosiaris@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
10:14 akosiaris@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
10:10 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host apus-fe2003.codfw.wmnet with OS bookworm
10:02 fabfur@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15 days, 0:00:00 on cp4047.ulsfo.wmnet with reason: HW errors
09:59 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
09:59 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
09:59 fabfur: disable puppet on A:cp-eqsin
09:59 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1133850 to use TLS on tmpfs on A:cp-eqsin (T384227)
09:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
09:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
09:54 akosiaris: deploy https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1133745 in all k8s ingresses to stop ingressgateway from forcefully setting the HTTP server header in the responses to "istio-envoy"
09:52 akosiaris@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:52 akosiaris@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:52 godog: lvextend --resizefs --size +1TB vg0/srv on mwlog[12]002
09:52 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
09:51 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
09:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
09:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
09:15 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
09:15 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
09:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3006.esams.wmnet
09:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3006.esams.wmnet
09:03 fabfur: secure deleting certificates in /etc/ssl/private from A:cp-ulsfo (T384227)
09:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3006.esams.wmnet
08:53 fabfur: secure deleting certificates in /etc/ssl/private from A:cp-magru (T384227)
08:48 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@c274545] (releasing): (no justification provided) (duration: 01m 03s)
08:47 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@c274545] (releasing): (no justification provided)
08:46 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@c274545] (releasing): (no justification provided) (duration: 00m 54s)
08:45 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@c274545] (releasing): (no justification provided)
08:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3006.esams.wmnet
08:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3005.esams.wmnet
08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3005.esams.wmnet
08:24 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
08:22 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
08:21 hashar: Upgrading CI Jenkins
08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
08:20 slyngshede@dns1004: END - running authdns-update
08:18 slyngshede@dns1004: START - running authdns-update
08:12 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
08:12 slyngshede@dns1004: START - running authdns-update
08:06 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
08:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3005.esams.wmnet
07:54 moritzm: failover ganeti masters in esams to ganeti3007/3008
07:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3008.esams.wmnet
07:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3008.esams.wmnet
07:44 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2044.codfw.wmnet
07:44 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1044.eqiad.wmnet
07:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3008.esams.wmnet
07:38 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2044.codfw.wmnet
07:38 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1044.eqiad.wmnet
07:36 moritzm: added spiderpig-access LDAP group T390338
07:31 fabfur: applying patch to use TLS on tmpfs on A:cp-ulsfo (T384227)
07:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3008.esams.wmnet
07:27 fabfur: disabling puppet on A:cp-ulsfo to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/1133405 (T384227)
07:22 elukey: restart docker on deploy1003 to pick up max-concurrent-uploads=1 - T390251
07:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3007.esams.wmnet
07:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3007.esams.wmnet
07:07 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
07:07 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync
07:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3007.esams.wmnet
06:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3007.esams.wmnet
00:39 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore2006
00:16 tstarling@deploy1003: Finished scap sync-world: Backport for Temporarily disable Lua profiler (T389734) (duration: 15m 04s)
00:15 zabe: zabe@mwmaint1002:~$ cat group2.dblist | xargs -I{} bash -c "echo {}; mwscript extensions/AbuseFilter/maintenance/MigrateESRefToAflTable.php {} --deletedump /home/zabe/afl_text_table_deletedump/{} --dump /home/zabe/afl_text_table_dump/{} --sleep 0.4" # T381599
00:09 tstarling@deploy1003: tstarling: Continuing with sync
00:08 tstarling@deploy1003: tstarling: Backport for Temporarily disable Lua profiler (T389734) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
00:01 tstarling@deploy1003: Started scap sync-world: Backport for Temporarily disable Lua profiler (T389734)

2025-04-02

23:32 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore1006
23:28 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore2005
22:38 jhathaway: puppet private repo changes completed, T385995
22:01 brett: Import ncmonitor 1.3.3 into bookworm-wikimedia
22:00 dreamyjazz@deploy1003: Finished scap sync-world: Backport for AbuseLogger: properly distinguish between global filters and central DB (T390904) (duration: 25m 19s)
21:55 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
21:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
21:53 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
21:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
21:53 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
21:53 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
21:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
21:53 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
21:52 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
21:41 dreamyjazz@deploy1003: dreamyjazz: Backport for AbuseLogger: properly distinguish between global filters and central DB (T390904) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:37 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore2004
21:35 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore1005
21:35 dreamyjazz@deploy1003: Started scap sync-world: Backport for AbuseLogger: properly distinguish between global filters and central DB (T390904)
21:31 reedy@deploy1003: Finished scap sync-world: Backport for Enable EmailAuth enforcement on group 0/1 (T390662) (duration: 15m 42s)
21:23 reedy@deploy1003: reedy, tgr: Continuing with sync
21:21 reedy@deploy1003: reedy, tgr: Backport for Enable EmailAuth enforcement on group 0/1 (T390662) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:15 reedy@deploy1003: Started scap sync-world: Backport for Enable EmailAuth enforcement on group 0/1 (T390662)
21:07 reedy@deploy1003: Finished scap sync-world: Backport for SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), Remove redundant WaitConditionLoop from CentralAuthTokenManager, Remove redundant WaitConditionLoop from CentralAuthTokenManager
21:00 reedy@deploy1003: d3r1ck01, matmarex, reedy: Continuing with sync
{{safesubst:SAL entry|1=20:52 reedy@deploy1003: d3r1ck01, matmarex, reedy: Backport for SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), Remove redundant WaitConditionLoop from CentralAuthTokenManager, [[gerrit:1133504|Remove redundant WaitConditionLoop from CentralAuthTokenManager]}}
20:47 reedy@deploy1003: Started scap sync-world: Backport for SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), Remove redundant WaitConditionLoop from CentralAuthTokenManager, Remove redundant WaitConditionLoop from CentralAuthTokenManager
20:14 reedy@deploy1003: Started scap sync-world: Backport for SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), Remove redundant WaitConditionLoop from CentralAuthTokenManager, Remove redundant WaitConditionLoop from CentralAuthTokenManager
19:54 jhathaway: rolling out a change to private repo, 1127150, please let me know if any issues arise when merging patches
18:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe2003.codfw.wmnet with OS bookworm
18:35 dancy@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.23 refs T386218
18:35 cstone: SmashPig upgraded from b9310c06 to 642ae816
18:00 reedy@deploy1003: reedy: Continuing with sync
{{safesubst:SAL entry|1=18:00 reedy@deploy1003: reedy: Backport for EmailAuth: Allow forceEmailAuth test check without extension dependencies (T390437), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), EmailAuthHooks: Exclude bot users from email auth check (T390662), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), [[gerrit:1133471|EmailA}}
17:57 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host apus-fe2003.codfw.wmnet with OS bookworm
17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
{{safesubst:SAL entry|1=17:47 reedy@deploy1003: Started scap sync-world: Backport for EmailAuth: Allow forceEmailAuth test check without extension dependencies (T390437), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), EmailAuthHooks: Exclude bot users from email auth check (T390662), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), [[ger}}
17:41 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
17:40 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
17:34 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
17:34 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
17:33 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
17:31 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
17:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
17:30 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
17:30 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
17:27 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
17:27 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
17:25 urandom: starting `nodetool garbagecollect` on sessionstore1004
17:17 urandom: updating Cassandra/sessionstore `gc_grace_seconds` to 259200 (from 864000)
17:13 brett: reloading varnish-frontend on A:cp and not A:cp-text_drmrs and not A:cp-text_codfw
17:08 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on cirrussearch2055.codfw.wmnet with reason: adding net-new role
{{safesubst:SAL entry|1=16:52 reedy@deploy1003: Started scap sync-world: Backport for EmailAuth: Allow forceEmailAuth test check without extension dependencies (T390437), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), EmailAuthHooks: Exclude bot users from email auth check (T390662), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), [[ger}}
16:27 vgutierrez: reload varnish on text@codfw to discard stale VCLs - T390846
16:26 swfrench@deploy1003: Finished scap sync-world: Deployment to pick up change in mediawiki-deployments.yaml - T389499 (duration: 03m 21s)
16:25 swfrench@deploy1003: swfrench: Continuing with sync
16:24 swfrench@deploy1003: swfrench: Deployment to pick up change in mediawiki-deployments.yaml - T389499 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:23 vgutierrez: reload varnish on text@drmrs to discard stale VCLs - T390846
16:23 swfrench@deploy1003: Started scap sync-world: Deployment to pick up change in mediawiki-deployments.yaml - T389499
16:10 swfrench-wmf: run-puppet-agent on deploy1003 to pick up mediawiki-deployments.yaml changes - T389499
15:28 arnaudb@dns1004: END - running authdns-update
15:19 arnaudb@dns1004: START - running authdns-update
15:16 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit2002.wikimedia.org with reason: maintenance
15:15 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on gerrit1003.wikimedia.org with reason: maintenance
15:07 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
15:06 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync
14:49 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet
14:43 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet
14:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-fe2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
14:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe2003.codfw.wmnet with OS bookworm
14:35 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox
14:18 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
14:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
14:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1042.eqiad.wmnet
14:13 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1042.eqiad.wmnet
14:12 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:12 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:11 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:11 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:10 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:10 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:07 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:06 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:06 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:06 volans@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.8.0 - volans@cumin1002
14:05 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:05 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:04 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:04 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
14:03 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
14:01 volans: upgrading homer to version 0.8.0 to cumin hosts
14:01 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
14:00 volans@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.8.0 - volans@cumin1002
13:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet
13:52 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
13:49 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet
13:49 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
13:43 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
13:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet
13:41 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
13:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet
13:40 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet
13:37 akosiaris: depool cp3066 for debugging T390854
13:37 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox
13:35 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet
13:33 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
13:24 Lucas_WMDE: UTC afternoon backport+config window done
13:21 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Configure virtual terms db for wikidata prod & test (T389190), Use wikidata familly in $wgCirrusSearchSimilarityProfile (duration: 16m 55s)
13:19 moritzm: installing gnutls28 security updates on Bookworm
13:14 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
13:14 lucaswerkmeister-wmde@deploy1003: jakob, hashar, lucaswerkmeister-wmde: Continuing with sync
13:14 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
13:11 lucaswerkmeister-wmde@deploy1003: jakob, hashar, lucaswerkmeister-wmde: Backport for Configure virtual terms db for wikidata prod & test (T389190), Use wikidata familly in $wgCirrusSearchSimilarityProfile synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:04 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Configure virtual terms db for wikidata prod & test (T389190), Use wikidata familly in $wgCirrusSearchSimilarityProfile
12:58 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
12:58 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
12:58 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
12:57 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
12:57 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
12:57 jelto@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2003.codfw.wmnet
12:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74582 and previous config saved to /var/cache/conftool/dbconfig/20250402-124139-root.json
12:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cephosd2003.codfw.wmnet
12:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2002.codfw.wmnet
12:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74581 and previous config saved to /var/cache/conftool/dbconfig/20250402-123029-root.json
12:28 jmm@dns1004: END - running authdns-update
12:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cephosd2002.codfw.wmnet
12:26 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74580 and previous config saved to /var/cache/conftool/dbconfig/20250402-122634-root.json
12:26 jmm@dns1004: START - running authdns-update
12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2001.codfw.wmnet
12:18 akosiaris@dns1004: END - running authdns-update
12:16 akosiaris@dns1004: START - running authdns-update
12:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74579 and previous config saved to /var/cache/conftool/dbconfig/20250402-121524-root.json
12:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cephosd2001.codfw.wmnet
12:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P74578 and previous config saved to /var/cache/conftool/dbconfig/20250402-121128-root.json
12:11 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on A:cephosd
12:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1040.eqiad.wmnet
12:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1040.eqiad.wmnet
12:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P74577 and previous config saved to /var/cache/conftool/dbconfig/20250402-120018-root.json
11:56 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74576 and previous config saved to /var/cache/conftool/dbconfig/20250402-115623-root.json
11:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74575 and previous config saved to /var/cache/conftool/dbconfig/20250402-114512-root.json
11:44 fabfur: securely erase certificates from A:cp-magru and provide symlink for acmecerts (T384227)
11:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P74574 and previous config saved to /var/cache/conftool/dbconfig/20250402-114117-root.json
11:40 vgutierrez: restart varnish on cp6016 - T390846
11:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P74573 and previous config saved to /var/cache/conftool/dbconfig/20250402-113007-root.json
11:26 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P74572 and previous config saved to /var/cache/conftool/dbconfig/20250402-112611-root.json
11:22 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
11:22 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
11:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
11:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
11:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
11:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
11:19 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
11:19 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
11:18 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
11:18 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
11:17 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
11:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
11:17 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
11:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
11:16 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet
11:16 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
11:16 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
11:15 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1043.eqiad.wmnet
11:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P74571 and previous config saved to /var/cache/conftool/dbconfig/20250402-111501-root.json
11:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74570 and previous config saved to /var/cache/conftool/dbconfig/20250402-111106-root.json
11:10 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet
11:09 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1043.eqiad.wmnet
11:09 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
11:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 60% (T360589) (duration: 15m 11s)
11:04 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
11:03 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:03 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
11:03 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
11:03 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
11:03 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
11:03 akosiaris@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:02 akosiaris@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:01 ladsgroup@deploy1003: ladsgroup: Continuing with sync
11:00 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on A:cephosd
11:00 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 60% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74569 and previous config saved to /var/cache/conftool/dbconfig/20250402-105956-root.json
10:56 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P74568 and previous config saved to /var/cache/conftool/dbconfig/20250402-105601-root.json
10:53 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 60% (T360589)
10:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P74567 and previous config saved to /var/cache/conftool/dbconfig/20250402-104450-root.json
10:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74566 and previous config saved to /var/cache/conftool/dbconfig/20250402-104055-root.json
10:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74564 and previous config saved to /var/cache/conftool/dbconfig/20250402-102944-root.json
10:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P74563 and previous config saved to /var/cache/conftool/dbconfig/20250402-102549-root.json
10:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1039.eqiad.wmnet
10:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1039.eqiad.wmnet
10:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
10:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
10:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P74561 and previous config saved to /var/cache/conftool/dbconfig/20250402-101439-root.json
10:13 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
10:13 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
10:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
10:10 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P74560 and previous config saved to /var/cache/conftool/dbconfig/20250402-101044-root.json
10:10 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
10:09 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
10:09 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
10:09 jelto@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
09:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P74559 and previous config saved to /var/cache/conftool/dbconfig/20250402-095933-root.json
09:59 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
09:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6003.drmrs.wmnet
09:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
09:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P74558 and previous config saved to /var/cache/conftool/dbconfig/20250402-095538-root.json
09:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
09:52 marostegui@cumin1002: dbctl commit (dc=all): 'Add db2243 to dbctl depooled T381475', diff saved to https://phabricator.wikimedia.org/P74557 and previous config saved to /var/cache/conftool/dbconfig/20250402-095213-marostegui.json
09:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P74556 and previous config saved to /var/cache/conftool/dbconfig/20250402-094428-root.json
09:41 marostegui@cumin1002: dbctl commit (dc=all): 'Add db1257 to dbctl depooled T381475', diff saved to https://phabricator.wikimedia.org/P74555 and previous config saved to /var/cache/conftool/dbconfig/20250402-094109-marostegui.json
09:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2042.codfw.wmnet
09:40 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1042.eqiad.wmnet
09:40 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
09:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2042.codfw.wmnet
09:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1042.eqiad.wmnet
09:29 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
09:27 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1041.eqiad.wmnet
09:24 XioNoX: rebooting mr1-ulsfo - T390052
09:24 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1036.eqiad.wmnet
09:23 ayounsi@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mr1-ulsfo with reason: reboot
09:21 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
09:21 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1041.eqiad.wmnet
09:19 akosiaris@dns1004: END - running authdns-update
09:18 akosiaris: create mw-wikifunctions-ingress.discovery.wmnet and .svc records to facilitate the migration to ingress
09:17 moritzm: failover ganeti masters in drmrs to ganeti6001/6002
09:16 akosiaris@dns1004: START - running authdns-update
09:16 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1036.eqiad.wmnet
09:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6002.drmrs.wmnet
09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
09:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
08:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6002.drmrs.wmnet
08:56 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti6001.drmrs.wmnet
08:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
08:55 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:50 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1036.eqiad.wmnet
08:48 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
08:48 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1036.eqiad.wmnet
08:48 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
08:48 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
08:47 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
08:47 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
08:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6001.drmrs.wmnet
08:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
08:47 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
08:46 akosiaris@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
08:46 akosiaris@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
08:45 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
08:45 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
08:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
08:41 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
08:40 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
08:38 jmm@dns1004: END - running authdns-update
08:38 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:37 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:36 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:36 jmm@dns1004: START - running authdns-update
08:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:32 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
08:32 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:31 XioNoX: trunk sandbox vlan to eqiad row B ganeti - T385560
08:30 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:30 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:28 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:28 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:26 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:23 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:23 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:18 fabfur: repooled cp7001 (T384227)
08:15 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:15 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:57 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:49 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:49 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:47 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs2013.*,lvs1019.*} and A:lvs
07:46 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs2013.*,lvs1019.*} and A:lvs
07:39 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:39 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:36 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs2014.*,lvs1020.*} and A:lvs
07:34 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs2014.*,lvs1020.*} and A:lvs
07:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:29 fabfur: depool cp7001 to fix stale ocsp alert (T384227)
07:19 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:18 jmm@dns1004: END - running authdns-update
07:16 jmm@dns1004: START - running authdns-update
07:02 jmm@dns1004: END - running authdns-update
06:59 jmm@dns1004: START - running authdns-update
06:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2004.codfw.wmnet

2025-04-01

23:43 reedy@deploy1003: rebuilt and synchronized wikiversions files: pihwiki to .23
23:40 ladsgroup@dns1004: END - running authdns-update
23:38 ladsgroup@dns1004: START - running authdns-update
23:34 ladsgroup@dns1004: END - running authdns-update
23:32 ladsgroup@dns1004: START - running authdns-update
23:27 ladsgroup@dns1004: END - running authdns-update
23:25 ladsgroup@dns1004: START - running authdns-update
23:20 ladsgroup@dns1004: END - running authdns-update
23:18 ladsgroup@dns1004: START - running authdns-update
23:03 ladsgroup@dns1004: END - running authdns-update
23:00 ladsgroup@dns1004: START - running authdns-update
22:04 bking@cumin2002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for cirrussearch2055.codfw.wmnet: Renew puppet certificate - bking@cumin2002
21:41 mutante: deploy1003 sudo -u mwdeploy /usr/local/bin/mwscript-cleanup --debug eqiad
20:46 taavi@deploy1003: Finished scap sync-world: Backport for homepage: Add `homepage_transfersize_bytes_total` metric (T382003), homepage: Add `homepage_transfersize_bytes_total` metric (T382003), Don't add WikiLove icon to Minerva (T390642) (duration: 16m 59s)
20:39 taavi@deploy1003: migr, taavi: Continuing with sync
20:37 taavi@deploy1003: migr, taavi: Backport for homepage: Add `homepage_transfersize_bytes_total` metric (T382003), homepage: Add `homepage_transfersize_bytes_total` metric (T382003), Don't add WikiLove icon to Minerva (T390642) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2006.codfw.wmnet with OS bullseye
20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2016.codfw.wmnet with OS bullseye
20:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:30 taavi@deploy1003: Started scap sync-world: Backport for homepage: Add `homepage_transfersize_bytes_total` metric (T382003), homepage: Add `homepage_transfersize_bytes_total` metric (T382003), Don't add WikiLove icon to Minerva (T390642)
20:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2015.codfw.wmnet with OS bullseye
20:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2007.codfw.wmnet with OS bullseye
20:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2005.codfw.wmnet with OS bullseye
20:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:21 taavi@deploy1003: Finished scap sync-world: Backport for [plwiki] Allow bureaucrats to remove users from sysop usergroup (T389829), Close pihwiki (T390732) (duration: 14m 18s)
20:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:14 taavi@deploy1003: superpes, taavi: Continuing with sync
20:13 taavi@deploy1003: superpes, taavi: Backport for [plwiki] Allow bureaucrats to remove users from sysop usergroup (T389829), Close pihwiki (T390732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2006.codfw.wmnet with reason: host reimage
20:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2016.codfw.wmnet with reason: host reimage
20:07 taavi@deploy1003: Started scap sync-world: Backport for [plwiki] Allow bureaucrats to remove users from sysop usergroup (T389829), Close pihwiki (T390732)
20:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2015.codfw.wmnet with reason: host reimage
20:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2007.codfw.wmnet with reason: host reimage
19:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2005.codfw.wmnet with reason: host reimage
19:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2016.codfw.wmnet with reason: host reimage
19:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2007.codfw.wmnet with reason: host reimage
19:55 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2015.codfw.wmnet with reason: host reimage
19:54 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2006.codfw.wmnet with reason: host reimage
19:54 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2005.codfw.wmnet with reason: host reimage
19:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host apus-fe2003.codfw.wmnet with OS bookworm
19:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2016.codfw.wmnet with OS bullseye
19:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe2007.codfw.wmnet with OS bullseye
19:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2015.codfw.wmnet with OS bullseye
19:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe2006.codfw.wmnet with OS bullseye
19:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe2005.codfw.wmnet with OS bullseye
19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['apus-fe2003']
19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe2016']
19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe2015']
19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe2007']
19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2007']
19:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe2006']
19:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe2005']
19:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['thanos-fe2007']
19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe2015']
19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe2016']
19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['apus-fe2003']
19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2007']
19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2006']
19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2005']
19:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2016.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-fe2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-fe2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2016.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-fe2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-fe2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-fe2003
19:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host apus-fe2003
19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe2016
19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe2016
19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe2015
19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe2015
19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe2007
19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe2007
19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe2006
19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe2006
19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe2005
19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe2005
19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-fe2005-7, ms-fe2015-6, and apus-fe2003 to codfw - jhancock@cumin2002"
19:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-fe2005-7, ms-fe2015-6, and apus-fe2003 to codfw - jhancock@cumin2002"
19:23 jhancock@cumin2002: START - Cookbook sre.dns.netbox
18:50 cstone: payments-wiki upgraded from 19b1c505 to e090b97b
18:25 bking@cumin2002: DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for cirrussearch2055.eqiad.wmnet: Renew puppet certificate - bking@cumin2002
18:25 bking@cumin2002: DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for cirrussearch2055.eqiad.wmnet: Renew puppet certificate - bking@cumin2002
18:20 dzahn@dns1004: END - running authdns-update
18:19 mforns@deploy1003: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
18:19 mforns@deploy1003: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
18:17 dzahn@dns1004: START - running authdns-update
18:15 dancy@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.23 refs T386218
18:11 dancy@deploy1003: Testing. Disreagard
17:58 herron@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=k8s-ingress-aux-rw,name=codfw
17:48 herron@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux-rw,name=eqiad
17:48 herron@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux-rw,name=codfw
17:48 herron@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux-ro,name=codfw
17:48 herron@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux-ro,name=eqiad
17:41 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2055.codfw.wmnet with OS bullseye
17:25 brett: importing varnishkafka 1.2.0-1 into bullseye-wikimedia main (T378737)
17:25 brett: importing libvmod-re2/varnish-re2 2.0.0-2~bpo11+wmf2 into bullseye-wikimedia main (T378737)
17:24 brett: importing libvmod-querysort 0.4-3 into bullseye-wikimedia main (T378737)
17:24 brett: importing libvmod-netmapper 1.9.1-1 into bullseye-wikimedia main (T378737)
17:23 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
17:23 brett: importing varnish-modules 0.20.0-2~bpo11 into bullseye-wikimedia main (T378737)
17:23 fabfur: repool cp7001, no certs removed (T384227)
17:22 brett: importing varnish 7.1.1-1.1~bpo11+wmf1 into bullseye-wikimedia main (T378737)
16:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2055
16:23 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2055
16:23 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2055.codfw.wmnet with OS bullseye
16:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
16:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
16:04 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2055.codfw.wmnet with OS bullseye
15:45 topranks: removing et-0/0/0 from ae0 bundle on cr3-ulsfo and cr4-ulsfo T390731
15:27 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on 27 hosts with reason: Maintenance in s2
15:27 dzahn@dns1004: END - running authdns-update
15:25 mutante: DNS - new project language 'nup' - Nupe (also known as Anufe, Nupenci, Nyinfe, and Tapa[3]) is a Volta–Niger language of the Nupoid branch primarily spoken by the Nupe people of the North Central region of Nigeria.
15:24 dzahn@dns1004: START - running authdns-update
15:19 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:18 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:11 brennen@deploy1003: Finished deploy [phabricator/deployment@53fcaf8]: deploy phab1004 for T390737 (duration: 00m 36s)
15:10 brennen@deploy1003: Started deploy [phabricator/deployment@53fcaf8]: deploy phab1004 for T390737
15:09 brennen@deploy1003: Finished deploy [phabricator/deployment@53fcaf8]: test deploy phab2002 for T390737 (duration: 00m 39s)
15:08 brennen@deploy1003: Started deploy [phabricator/deployment@53fcaf8]: test deploy phab2002 for T390737
15:05 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: phabricator deploy
15:04 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: phabricator deploy
14:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:51 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet [reason: finished T390658]
14:50 fabfur: depooled cp7001 to test secure removal of unused certificates (T384227)
14:49 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2006.codfw.wmnet with OS bookworm
14:47 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2055
14:47 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2055
14:46 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2055
14:46 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2055.codfw.wmnet 180.0.192.10.in-addr.arpa 0.8.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
14:46 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2055.codfw.wmnet 180.0.192.10.in-addr.arpa 0.8.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
14:46 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:46 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2055 - bking@cumin2002"
14:46 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2055 - bking@cumin2002"
14:42 bking@cumin2002: START - Cookbook sre.dns.netbox
14:41 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:41 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
14:41 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
14:40 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
14:40 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
14:40 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2055
14:40 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2055.codfw.wmnet with OS bullseye
14:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2055 to cirrussearch2055
14:37 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2055
14:37 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2055
14:37 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:37 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2055 to cirrussearch2055 - bking@cumin2002"
14:36 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2055 to cirrussearch2055 - bking@cumin2002"
14:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for CommmonSettings: Remove old BounceHandler DB config (duration: 15m 28s)
14:32 bking@cumin2002: START - Cookbook sre.dns.netbox
14:31 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2055 to cirrussearch2055
14:28 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
14:27 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2006.codfw.wmnet with reason: host reimage
14:26 ladsgroup@deploy1003: reedy, ladsgroup: Continuing with sync
14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T370903)', diff saved to https://phabricator.wikimedia.org/P74547 and previous config saved to /var/cache/conftool/dbconfig/20250401-142516-ladsgroup.json
14:24 ladsgroup@deploy1003: reedy, ladsgroup: Backport for CommmonSettings: Remove old BounceHandler DB config synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2006.codfw.wmnet with reason: host reimage
14:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T370903)', diff saved to https://phabricator.wikimedia.org/P74546 and previous config saved to /var/cache/conftool/dbconfig/20250401-142228-ladsgroup.json
14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for CommmonSettings: Remove old BounceHandler DB config
14:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo and group 1
14:15 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo and group 1
14:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P74545 and previous config saved to /var/cache/conftool/dbconfig/20250401-141008-ladsgroup.json
14:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74544 and previous config saved to /var/cache/conftool/dbconfig/20250401-140721-ladsgroup.json
14:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
14:05 elukey: roll restart nginx on registry* to remove debug logging - too much data, filling up the root partition
14:02 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host registry2005.codfw.wmnet
14:00 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2006.codfw.wmnet with OS bookworm
13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P74543 and previous config saved to /var/cache/conftool/dbconfig/20250401-135501-ladsgroup.json
13:53 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host registry2005.codfw.wmnet
13:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74542 and previous config saved to /var/cache/conftool/dbconfig/20250401-135215-ladsgroup.json
13:48 elukey: depool registry2005 to investigate some nginx logging issue
13:44 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2035.codfw.wmnet [reason: T390658]
13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T370903)', diff saved to https://phabricator.wikimedia.org/P74540 and previous config saved to /var/cache/conftool/dbconfig/20250401-133954-ladsgroup.json
13:39 elukey: restart nginx on registry2005 - stuck writing error logs
13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2005.codfw.wmnet with OS bookworm
13:37 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
13:37 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
13:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T370903)', diff saved to https://phabricator.wikimedia.org/P74539 and previous config saved to /var/cache/conftool/dbconfig/20250401-133707-ladsgroup.json
13:35 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
13:35 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
13:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm
13:29 Lucas_WMDE: UTC afternoon backport+config window done
13:28 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Remove 'exception-json' logging channel, Disable experiment-related config during active development (duration: 18m 04s)
13:27 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
13:26 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1040.eqiad.wmnet
13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T370903)', diff saved to https://phabricator.wikimedia.org/P74537 and previous config saved to /var/cache/conftool/dbconfig/20250401-132407-ladsgroup.json
13:24 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2165.codfw.wmnet with reason: Maintenance
13:21 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, cjming, matmarex: Continuing with sync
13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1160 (T370903)', diff saved to https://phabricator.wikimedia.org/P74536 and previous config saved to /var/cache/conftool/dbconfig/20250401-132059-ladsgroup.json
13:20 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
13:20 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1040.eqiad.wmnet
13:20 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
13:18 moritzm: installing python-cryptography security updates
13:18 moritzm: installing python-cryptohgraphy security updates
13:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2005.codfw.wmnet with reason: host reimage
13:17 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, cjming, matmarex: Backport for Remove 'exception-json' logging channel, Disable experiment-related config during active development synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T371742)', diff saved to https://phabricator.wikimedia.org/P74534 and previous config saved to /var/cache/conftool/dbconfig/20250401-131530-ladsgroup.json
13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage
13:13 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2005.codfw.wmnet with reason: host reimage
13:10 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage
13:10 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Remove 'exception-json' logging channel, Disable experiment-related config during active development
13:05 elukey: restart nginx on registry* to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/1133112 - debug logs to /var/log/nginx/debug.log - T390251
13:04 XioNoX: msw2-eqiad> restart jsd gracefully - T390052
13:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74533 and previous config saved to /var/cache/conftool/dbconfig/20250401-130023-ladsgroup.json
12:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm
12:48 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2005.codfw.wmnet with OS bookworm
12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2004.codfw.wmnet with OS bookworm
12:47 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
12:47 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74530 and previous config saved to /var/cache/conftool/dbconfig/20250401-124516-ladsgroup.json
12:44 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
12:44 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
12:43 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
12:43 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
12:42 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
12:42 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
12:42 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.zarcillo (exit_code=99)
12:41 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
12:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2040.codfw.wmnet
12:41 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.zarcillo (exit_code=99)
12:40 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1039.eqiad.wmnet
12:39 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti4008.ulsfo.wmnet
12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
12:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2040.codfw.wmnet
12:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
12:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T371742)', diff saved to https://phabricator.wikimedia.org/P74529 and previous config saved to /var/cache/conftool/dbconfig/20250401-123009-ladsgroup.json
12:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
12:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2004.codfw.wmnet with reason: host reimage
12:24 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2004.codfw.wmnet with reason: host reimage
12:23 moritzm: installing PHP 7.4 security updates (as shipped in Debian, not our internal build running on a few remaining edge cases)
12:12 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
12:11 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
12:11 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
12:11 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
12:08 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1038.eqiad.wmnet
12:08 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
12:08 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2039.codfw.wmnet
12:08 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
12:04 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2004.codfw.wmnet with OS bookworm
12:02 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
12:02 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1038.eqiad.wmnet
12:02 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
12:02 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2039.codfw.wmnet
11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T371742)', diff saved to https://phabricator.wikimedia.org/P74528 and previous config saved to /var/cache/conftool/dbconfig/20250401-115935-ladsgroup.json
11:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2003.codfw.wmnet with OS bookworm
11:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P74527 and previous config saved to /var/cache/conftool/dbconfig/20250401-114428-ladsgroup.json
11:34 Lucas_WMDE: Deployed patch for T389369
11:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2003.codfw.wmnet with reason: host reimage
11:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P74526 and previous config saved to /var/cache/conftool/dbconfig/20250401-112921-ladsgroup.json
11:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2003.codfw.wmnet with reason: host reimage
11:26 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2038.codfw.wmnet
11:25 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1037.eqiad.wmnet
11:24 moritzm: installing squid security updates
11:22 hashar: Restarting Gerrit
11:19 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2038.codfw.wmnet
11:18 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1037.eqiad.wmnet
11:16 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti4008.ulsfo.wmnet
11:16 topranks: reboot cr4-ulsfo to upgrade JunOS T364092
11:15 hashar: Restarted Gerrit replica on gerrit2002 to raise heap from 32G to 64G | T387223
11:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T371742)', diff saved to https://phabricator.wikimedia.org/P74525 and previous config saved to /var/cache/conftool/dbconfig/20250401-111415-ladsgroup.json
11:13 volans@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on sretest1002.eqiad.wmnet with reason: Test
11:12 moritzm: restarting FPM on phab1004 to pick up security update
11:10 volans: upgrading spicerack to v10.0.0 on cumin1002
11:10 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: Upgrade cr4-ulsfo JunOS
11:06 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti4008.ulsfo.wmnet with reason: remove from cluster for reimage
11:06 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2003.codfw.wmnet with OS bookworm
11:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
11:05 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2006.codfw.wmnet
11:04 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
11:04 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
11:04 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
11:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2002.codfw.wmnet with OS bookworm
11:02 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
10:58 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 55% (T360589) (duration: 22m 03s)
10:58 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2006.codfw.wmnet
10:57 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all
10:56 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2005.codfw.wmnet
10:56 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1006.eqiad.wmnet
10:55 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1211.eqiad.wmnet onto db1257.eqiad.wmnet
10:55 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1211 slowly with 10 steps - Pool db1211.eqiad.wmnet in after cloning
10:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T371742)', diff saved to https://phabricator.wikimedia.org/P74523 and previous config saved to /var/cache/conftool/dbconfig/20250401-105425-ladsgroup.json
10:54 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
10:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2005.codfw.wmnet
10:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1006.eqiad.wmnet
10:48 ladsgroup@deploy1003: ladsgroup: Continuing with sync
10:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T371742)', diff saved to https://phabricator.wikimedia.org/P74522 and previous config saved to /var/cache/conftool/dbconfig/20250401-104659-ladsgroup.json
10:46 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
10:46 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 55% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:45 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
10:44 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-test
10:43 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-test
10:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
10:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
10:36 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 55% (T360589)
10:33 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2004.codfw.wmnet
10:33 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1005.eqiad.wmnet
10:27 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet
10:26 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1005.eqiad.wmnet
10:25 akosiaris@deploy1003: Finished scap sync-world: Backport for typos: Add wnmet as a typo (duration: 29m 34s)
10:24 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1004.eqiad.wmnet
10:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2002.codfw.wmnet with OS bookworm
10:19 jiji@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc-gp2004.codfw.wmnet
10:19 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet
10:19 aqu@deploy1003: Finished deploy [airflow-dags/analytics@d96f732]: Update artifacts for analytics (duration: 00m 59s)
10:18 aqu@deploy1003: Started deploy [airflow-dags/analytics@d96f732]: Update artifacts for analytics
10:17 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1004.eqiad.wmnet
10:17 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@d96f732]: Update artifacts for analytics_test (duration: 00m 12s)
10:17 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@d96f732]: Update artifacts for analytics_test
10:17 jiji@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc-gp1004.eqiad.wmnet
10:16 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1004.eqiad.wmnet
10:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2001.codfw.wmnet with OS bookworm
10:09 akosiaris@deploy1003: akosiaris: Continuing with sync
10:08 akosiaris@deploy1003: akosiaris: Backport for typos: Add wnmet as a typo synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:00 joal@deploy1003: Finished deploy [analytics/refinery@efc4808] (hadoop-test): Analytics webrequest migration TEST [analytics/refinery@efc48089] (duration: 00m 40s)
09:59 joal@deploy1003: Started deploy [analytics/refinery@efc4808] (hadoop-test): Analytics webrequest migration TEST [analytics/refinery@efc48089]
09:59 joal@deploy1003: Finished deploy [analytics/refinery@efc4808] (thin): Analytics webrequest migration THIN [analytics/refinery@efc48089] (duration: 00m 55s)
09:58 joal@deploy1003: Started deploy [analytics/refinery@efc4808] (thin): Analytics webrequest migration THIN [analytics/refinery@efc48089]
09:57 joal@deploy1003: Finished deploy [analytics/refinery@efc4808]: Analytics webrequest migration [analytics/refinery@efc48089] (duration: 02m 24s)
09:57 moritzm: installing freetype security updates
09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage
09:55 akosiaris@deploy1003: Started scap sync-world: Backport for typos: Add wnmet as a typo
09:55 akosiaris: scap backport a noop change https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1133069 for T390251
09:55 joal@deploy1003: Started deploy [analytics/refinery@efc4808]: Analytics webrequest migration [analytics/refinery@efc48089]
09:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage
09:50 elukey: restart nginx on registry* to pick up the debug changes
09:42 volans@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on sretest1001.eqiad.wmnet with reason: test
09:39 gmodena@deploy1003: Finished deploy [airflow-dags/search@ed0fc78]: Deploy mjolnir-2.7.0.dev.conda.tgz (duration: 01m 29s)
09:38 gmodena@deploy1003: Started deploy [airflow-dags/search@ed0fc78]: Deploy mjolnir-2.7.0.dev.conda.tgz
09:32 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2001.codfw.wmnet with OS bookworm
09:27 ayounsi@cumin1002: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device mr1-ulsfo
09:26 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-ulsfo
09:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
09:19 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
09:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
08:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
08:58 dcausse@deploy1003: Finished deploy [wdqs/wdqs@354b5ac]: revert T326311, deletion query way too slow (duration: 12m 15s)
08:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
08:50 hashar@deploy1003: Finished deploy [integration/docroot@5256e19]: build: Updating eslint-config-wikimedia to 0.29.1 (duration: 00m 09s)
08:50 hashar@deploy1003: Started deploy [integration/docroot@5256e19]: build: Updating eslint-config-wikimedia to 0.29.1
08:46 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device msw1-eqiad
08:46 topranks: Drain Lumen cct from codfw to ulsfo due to instability T390660
08:46 dcausse@deploy1003: Started deploy [wdqs/wdqs@354b5ac]: revert T326311, deletion query way too slow
08:45 volans: upgrading spicerack to v10.0.0 on cumin2002
08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device msw1-eqiad
08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device msw2-eqiad
08:38 marostegui@cumin1002: START - Cookbook sre.mysql.pool db1211 slowly with 10 steps - Pool db1211.eqiad.wmnet in after cloning
08:36 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device msw2-eqiad
08:36 moritzm: failover ganeti master in ulsfo to ganeti4005 T382511
08:35 volans: temporary disable puppet on cumin1002 for the spicerack upgrade to v10.0.0
08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device msw1-codfw
08:34 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti4007
08:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti4007
08:33 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
08:32 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device msw1-codfw
08:29 elukey: set debug logging for registry*'s nginx - T390251
08:29 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device msw2-codfw
08:29 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
08:27 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device msw2-codfw
08:24 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-eqiad
08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
08:18 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-eqiad
08:18 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-eqsin
08:17 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
08:16 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
08:14 dcausse: T390665: restart blazegraph on wdqs2017
08:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
08:12 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
08:12 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
08:11 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-eqsin
08:11 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
08:11 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
08:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-esams
08:05 dcausse: restarting blazegraph on wdqs2016
08:04 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
08:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
08:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4007.ulsfo.wmnet with OS bookworm
08:00 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
07:59 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
07:59 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-esams
07:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-drmrs
07:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-drmrs
07:50 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-magru
07:47 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
07:46 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
07:44 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-magru
07:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4007.ulsfo.wmnet with reason: host reimage
07:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-codfw
07:37 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4007.ulsfo.wmnet with reason: host reimage
07:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-codfw
07:34 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
07:31 ayounsi@cumin1002: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device mr1-ulsfo
07:30 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
07:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-ulsfo
07:28 ayounsi@cumin1002: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device mr1-ulsfo
07:28 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-ulsfo
07:26 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-ulsfo
07:24 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
07:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4007.ulsfo.wmnet with OS bookworm
07:19 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-ulsfo
07:19 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1b-eqiad
07:17 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1b-eqiad
06:14 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1211.eqiad.wmnet onto db1257.eqiad.wmnet
05:33 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@557a834]: 0.3.155 (duration: 12m 49s)
05:22 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.155` on canary `wdqs1015`; proceeding to rest of fleet
05:20 ryankemper@deploy1003: Started deploy [wdqs/wdqs@557a834]: 0.3.155
05:14 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.155`. Pre-deploy tests passing on canary `wdqs1016`
04:04 mwpresync@deploy1003: Pruned MediaWiki: 1.44.0-wmf.20 (duration: 04m 34s)
03:02 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.44.0-wmf.23 refs T386218

2025-04-26

2025-04-25

2025-04-24

2025-04-23

2025-04-22

2025-04-21

2025-04-19

2025-04-18

2025-04-17

2025-04-16

2025-04-15

2025-04-14

2025-04-12

2025-04-11

2025-04-10

2025-04-09

2025-04-09

2025-04-08

2025-04-07

2025-04-04

2025-04-03

2025-04-02

2025-04-01

Archives