Jump to content

Server Admin Log

From Wikitech

2025-04-26

  • 06:34 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 06:34 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 03:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2086.codfw.wmnet with OS bullseye
  • 02:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2086.codfw.wmnet with reason: host reimage
  • 02:51 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2086.codfw.wmnet with reason: host reimage
  • 02:49 sbassett: Deployed security fix for T392746
  • 02:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2086
  • 02:37 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2086
  • 02:34 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2086
  • 02:34 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2086.codfw.wmnet 179.48.192.10.in-addr.arpa 9.7.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 02:34 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2086.codfw.wmnet 179.48.192.10.in-addr.arpa 9.7.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 02:34 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:34 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2086 - bking@cumin2002"
  • 02:34 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2086 - bking@cumin2002"

2025-04-25

  • 21:56 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2086
  • 21:53 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2086.codfw.wmnet with OS bullseye
  • 20:58 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2086.codfw.wmnet with OS bullseye
  • 20:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2086.codfw.wmnet with OS bullseye
  • 20:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2086 to cirrussearch2086
  • 20:53 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2086
  • 20:53 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2086
  • 20:53 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:53 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2086 to cirrussearch2086 - bking@cumin2002"
  • 20:52 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2086 to cirrussearch2086 - bking@cumin2002"
  • 20:47 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 20:38 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 20:34 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 20:34 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2086 to cirrussearch2086
  • 20:32 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2045.codfw.wmnet with OS bookworm
  • 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:47 cstone: payments-wiki upgraded from c6ba1f35 to e7f66569
  • 19:39 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 19:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host relforge1010.eqiad.wmnet with OS bullseye
  • 19:27 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:26 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on relforge1010.eqiad.wmnet with reason: host reimage
  • 19:09 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on relforge1010.eqiad.wmnet with reason: host reimage
  • 19:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host relforge1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host relforge1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:57 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2084.codfw.wmnet with OS bullseye
  • 18:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host relforge1010.eqiad.wmnet with OS bullseye
  • 18:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host relforge1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2084.codfw.wmnet with reason: host reimage
  • 18:30 jclark@cumin1002: START - Cookbook sre.hosts.provision for host relforge1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:28 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2084.codfw.wmnet with reason: host reimage
  • 18:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 18:23 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 18:18 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
  • 18:18 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
  • 18:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 18:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2084
  • 18:14 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2084
  • 18:13 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2084
  • 18:13 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2084.codfw.wmnet 56.48.192.10.in-addr.arpa 6.5.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 18:13 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2084.codfw.wmnet 56.48.192.10.in-addr.arpa 6.5.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 18:13 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:13 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2084 - bking@cumin2002"
  • 18:13 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2084 - bking@cumin2002"
  • 18:08 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 17:14 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2084
  • 17:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2084.codfw.wmnet with OS bullseye
  • 17:13 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2084.codfw.wmnet on all recursors
  • 17:13 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2084.codfw.wmnet on all recursors
  • 17:13 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2084 to cirrussearch2084
  • 17:12 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2084
  • 17:08 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2084
  • 17:08 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:08 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2084 to cirrussearch2084 - bking@cumin2002"
  • 17:07 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2084 to cirrussearch2084 - bking@cumin2002"
  • 17:03 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 17:03 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2084 to cirrussearch2084
  • 16:58 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 16:56 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Managing sanitization for wikis nupwiki in section s5
  • 16:56 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis nupwiki in section s5
  • 16:47 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Managing sanitization for wikis nupwiki in section s5
  • 16:45 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2083.codfw.wmnet with OS bullseye
  • 16:44 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis nupwiki in section s5
  • 16:31 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Managing sanitization for wikis nupwiki in section s5
  • 16:27 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis nupwiki in section s5
  • 16:26 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2083.codfw.wmnet with reason: host reimage
  • 16:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2083.codfw.wmnet with reason: host reimage
  • 16:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2083
  • 16:08 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2083
  • 16:08 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2083
  • 16:08 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2083.codfw.wmnet 88.32.192.10.in-addr.arpa 8.8.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 16:07 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2083.codfw.wmnet 88.32.192.10.in-addr.arpa 8.8.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 16:07 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:07 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2083 - bking@cumin2002"
  • 16:07 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2083 - bking@cumin2002"
  • 16:03 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:03 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2083
  • 16:03 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2083.codfw.wmnet with OS bullseye
  • 16:02 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2083.codfw.wmnet on all recursors
  • 16:02 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2083.codfw.wmnet on all recursors
  • 16:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2083 to cirrussearch2083
  • 15:59 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2083
  • 15:59 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2083
  • 15:59 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:59 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2083 to cirrussearch2083 - bking@cumin2002"
  • 15:59 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2083 to cirrussearch2083 - bking@cumin2002"
  • 15:55 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 15:55 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2083 to cirrussearch2083
  • 15:54 dancy@deploy1003: Installation of scap version "4.157.1" completed for 2 hosts
  • 15:52 dancy@deploy1003: Installing scap version "4.157.1" for 2 host(s)
  • 15:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
  • 15:38 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
  • 15:38 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 15:29 dancy@deploy1003: Installation of scap version "4.157.0" completed for 2 hosts
  • 15:27 dancy@deploy1003: Installing scap version "4.157.0" for 2 host(s)
  • 15:18 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Managing sanitization for wikis nupwiki in section s5
  • 15:10 dancy: dancy@deploy1003 Cancelled
  • 15:09 dancy@deploy1003: Installing scap version "4.156.0" for 2 host(s)
  • 15:07 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis nupwiki in section s5
  • 15:07 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Checking sanitization for wikis nupwiki in section s5
  • 15:06 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis nupwiki in section s5
  • 14:46 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2081.codfw.wmnet with OS bullseye
  • 14:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2081.codfw.wmnet with reason: host reimage
  • 14:26 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis nupwiki in section s5
  • 14:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2081.codfw.wmnet with reason: host reimage
  • 14:23 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis nupwiki in section s5
  • 14:13 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudlb2001-dev.codfw.wmnet
  • 14:13 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2081
  • 14:11 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2081
  • 14:11 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 14:10 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2081
  • 14:10 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2081.codfw.wmnet 86.32.192.10.in-addr.arpa 6.8.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:10 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2081.codfw.wmnet 86.32.192.10.in-addr.arpa 6.8.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:10 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:10 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2081 - bking@cumin2002"
  • 14:10 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2081 - bking@cumin2002"
  • 14:09 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Managing sanitization for wikis nupwiki in section s5
  • 14:05 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:03 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis nupwiki in section s5
  • 14:02 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudlb2001-dev.codfw.wmnet
  • 13:59 Emperor: restart object-replicator on ms-be2089
  • 13:51 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2081
  • 13:51 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2081.codfw.wmnet with OS bullseye
  • 13:51 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Managing sanitization for wikis nupwiki in section s5
  • 13:51 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2081.codfw.wmnet on all recursors
  • 13:51 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2081.codfw.wmnet on all recursors
  • 13:49 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2081 to cirrussearch2081
  • 13:48 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2081
  • 13:48 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2081
  • 13:48 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:48 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2081 to cirrussearch2081 - bking@cumin2002"
  • 13:47 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2081 to cirrussearch2081 - bking@cumin2002"
  • 13:46 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis nupwiki in section s5
  • 13:45 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis nupwiki in section s5
  • 13:43 taavi@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2004-dev.codfw.wmnet
  • 13:43 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis nupwiki in section s5
  • 13:36 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Managing sanitization for wikis nupwiki in section s5
  • 13:34 taavi@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudlb2004-dev.codfw.wmnet
  • 13:33 taavi: add cloudlb2004-dev bgp session to cloudsw1-b1-codfw T377126
  • 13:33 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 13:32 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2081 to cirrussearch2081
  • 13:31 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis nupwiki in section s5
  • 13:29 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis nupwiki in section s5
  • 13:26 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis nupwiki in section s5
  • 13:08 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1001.eqiad.wmnet with OS trixie
  • 12:58 vgutierrez: restarting grafana-server.service @ grafana1002.eqiad.wmnet
  • 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 11:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 11:31 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS trixie
  • 09:31 moritzm: restarting puppetserver on puppetserver1002 (apparently needs a restart which per timing seems related to https://gerrit.wikimedia.org/r/c/operations/puppet/+/1138904)
  • 09:16 vgutierrez: restarting puppetserver on puppetserver1003
  • 09:11 taavi: removed cloudlb2001-dev bgp session from cloudsw1-b1-codfw T377126
  • 08:24 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2032 to es1 master T391921', diff saved to https://phabricator.wikimedia.org/P75463 and previous config saved to /var/cache/conftool/dbconfig/20250425-082420-marostegui.json
  • 07:50 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2002-dev.codfw.wmnet
  • 07:44 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on krb1002.eqiad.wmnet with reason: work in progress, not yet active
  • 07:38 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2002-dev.codfw.wmnet
  • 07:13 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P75462 and previous config saved to /var/cache/conftool/dbconfig/20250425-071339-root.json
  • 06:58 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P75461 and previous config saved to /var/cache/conftool/dbconfig/20250425-065834-root.json
  • 06:43 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P75460 and previous config saved to /var/cache/conftool/dbconfig/20250425-064329-root.json
  • 06:28 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P75459 and previous config saved to /var/cache/conftool/dbconfig/20250425-062824-root.json
  • 06:13 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P75458 and previous config saved to /var/cache/conftool/dbconfig/20250425-061319-root.json
  • 05:58 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P75457 and previous config saved to /var/cache/conftool/dbconfig/20250425-055813-root.json
  • 05:43 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P75456 and previous config saved to /var/cache/conftool/dbconfig/20250425-054308-root.json
  • 05:42 marostegui@dns1006: END - running authdns-update
  • 05:39 marostegui@dns1006: START - running authdns-update
  • 05:37 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1032 to es1 master T391921', diff saved to https://phabricator.wikimedia.org/P75455 and previous config saved to /var/cache/conftool/dbconfig/20250425-053744-marostegui.json
  • 05:28 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P75454 and previous config saved to /var/cache/conftool/dbconfig/20250425-052802-root.json
  • 05:12 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P75453 and previous config saved to /var/cache/conftool/dbconfig/20250425-051257-root.json
  • 05:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2030.codfw.wmnet with reason: Maintenance
  • 05:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2030 T391921', diff saved to https://phabricator.wikimedia.org/P75452 and previous config saved to /var/cache/conftool/dbconfig/20250425-050538-marostegui.json

2025-04-24

  • 23:47 pt1979@cumin2002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f1-codfw
  • 23:47 pt1979@cumin2002: START - Cookbook sre.network.tls for network device lsw1-f1-codfw
  • 23:42 eileen: config revision changed from 7bf2c087 to 1c84d1a7
  • 23:32 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 23:31 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 23:30 rzl@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 23:29 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 23:28 rzl@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 23:28 rzl@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 23:27 rzl@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 23:25 rzl@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 23:22 rzl@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 22:34 zabe@deploy1003: Finished scap sync-world: T390384 (duration: 11m 08s)
  • 22:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2080.codfw.wmnet with OS bullseye
  • 22:23 zabe@deploy1003: Started scap sync-world: T390384
  • 22:17 eileen: config revision changed from 47a5d384 to 7bf2c087
  • 22:15 zabe@deploy1003: Finished scap sync-world: Backport for Activate nupwiki (T390384) (duration: 11m 54s)
  • 22:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
  • 22:11 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
  • 22:11 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 22:10 eileen: update CIviCRM civicrm: revision 3ca2db06
  • 22:09 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 22:08 zabe@deploy1003: zabe: Continuing with sync
  • 22:07 zabe@deploy1003: zabe: Backport for Activate nupwiki (T390384) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2080.codfw.wmnet with reason: host reimage
  • 22:04 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2080.codfw.wmnet with reason: host reimage
  • 22:03 zabe@deploy1003: Started scap sync-world: Backport for Activate nupwiki (T390384)
  • 22:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
  • 22:02 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
  • 22:02 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 22:00 zabe@deploy1003: Finished scap sync-world: Backport for Prepare nupwiki (T390384) (duration: 13m 15s)
  • 21:54 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-f1-codfw.mgmt.codfw.wmnet
  • 21:53 zabe@deploy1003: zabe: Continuing with sync
  • 21:52 zabe@deploy1003: zabe: Backport for Prepare nupwiki (T390384) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:49 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2080
  • 21:49 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2080
  • 21:49 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2080
  • 21:49 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2080.codfw.wmnet 127.16.192.10.in-addr.arpa 7.2.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 21:49 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2080.codfw.wmnet 127.16.192.10.in-addr.arpa 7.2.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 21:49 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:49 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2080 - bking@cumin2002"
  • 21:49 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2080 - bking@cumin2002"
  • 21:47 zabe@deploy1003: Started scap sync-world: Backport for Prepare nupwiki (T390384)
  • 21:45 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:45 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2080
  • 21:44 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2080.codfw.wmnet with OS bullseye
  • 21:39 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2080.codfw.wmnet with OS bullseye
  • 21:39 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2080.codfw.wmnet with OS bullseye
  • 21:35 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2080 to cirrussearch2080
  • 21:35 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2080
  • 21:29 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for db1178.eqiad.wmnet: Renew puppet certificate - jhathaway@cumin1002
  • 21:28 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:28 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-f1-codfw - pt1979@cumin2002"
  • 21:28 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-f1-codfw - pt1979@cumin2002"
  • 21:26 jhathaway@cumin1002: DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for db1178.eqiad.wmnet: Renew puppet certificate - jhathaway@cumin1002
  • 21:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 21:24 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-f1-codfw.mgmt.codfw.wmnet
  • 21:23 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2080
  • 21:23 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:23 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2080 to cirrussearch2080 - bking@cumin2002"
  • 21:23 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2080 to cirrussearch2080 - bking@cumin2002"
  • 21:18 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:18 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2080 to cirrussearch2080
  • 21:13 jhathaway: restarting puppetserver1002 to test crl
  • 20:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
  • 20:58 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
  • 20:58 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 20:57 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 20:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
  • 20:51 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
  • 20:51 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 20:38 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cirrussearch2078']
  • 20:35 jgleeson: payments-wiki upgraded from d250a3b8 to c6ba1f35
  • 20:29 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2078']
  • 20:29 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 20:22 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
  • 20:22 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
  • 20:22 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 20:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2076.codfw.wmnet with OS bullseye
  • 20:13 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 19:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2076.codfw.wmnet with reason: host reimage
  • 19:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
  • 19:55 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
  • 19:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 19:52 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 19:52 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2076.codfw.wmnet with reason: host reimage
  • 19:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
  • 19:48 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
  • 19:48 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 19:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2076
  • 19:38 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2076
  • 19:35 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2076
  • 19:35 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2076.codfw.wmnet 206.0.192.10.in-addr.arpa 6.0.2.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 19:35 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2076.codfw.wmnet 206.0.192.10.in-addr.arpa 6.0.2.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 19:35 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:35 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2076 - bking@cumin2002"
  • 19:35 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2076 - bking@cumin2002"
  • 19:30 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 19:29 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2076
  • 19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating pdus in codfw - jhancock@cumin2002"
  • 19:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2076.codfw.wmnet with OS bullseye
  • 19:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating pdus in codfw - jhancock@cumin2002"
  • 19:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2076 to cirrussearch2076
  • 19:28 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2076
  • 19:28 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2076
  • 19:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:28 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2076 to cirrussearch2076 - bking@cumin2002"
  • 19:25 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2076 to cirrussearch2076 - bking@cumin2002"
  • 19:25 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 19:19 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 19:19 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2076 to cirrussearch2076
  • 17:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2073.codfw.wmnet with OS bullseye
  • 17:41 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 17:39 rzl@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 17:39 pt1979@cumin2002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f3-codfw
  • 17:39 pt1979@cumin2002: START - Cookbook sre.network.tls for network device lsw1-f3-codfw
  • 17:36 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 17:35 rzl@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 17:32 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2073.codfw.wmnet with reason: host reimage
  • 17:28 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2073.codfw.wmnet with reason: host reimage
  • 17:18 rzl@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:17 rzl@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 17:16 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-f3-codfw.mgmt.codfw.wmnet
  • 17:14 rzl@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2073
  • 17:14 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2073
  • 17:13 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2073
  • 17:13 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2073.codfw.wmnet 28.0.192.10.in-addr.arpa 8.2.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 17:13 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2073.codfw.wmnet 28.0.192.10.in-addr.arpa 8.2.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 17:13 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:13 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2073 - bking@cumin2002"
  • 17:13 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2073 - bking@cumin2002"
  • 17:09 rzl@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:09 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 17:08 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2073
  • 17:08 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2073.codfw.wmnet with OS bullseye
  • 17:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2073 to cirrussearch2073
  • 17:04 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2073
  • 17:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2073
  • 17:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2073 to cirrussearch2073 - bking@cumin2002"
  • 17:03 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2073 to cirrussearch2073 - bking@cumin2002"
  • 16:58 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:58 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2073 to cirrussearch2073
  • 16:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:45 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-f3-codfw - pt1979@cumin2002"
  • 16:41 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-f3-codfw - pt1979@cumin2002"
  • 16:37 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:37 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-f3-codfw.mgmt.codfw.wmnet
  • 16:24 pt1979@cumin2002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-codfw
  • 16:24 pt1979@cumin2002: START - Cookbook sre.network.tls for network device ssw1-f1-codfw
  • 16:21 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-f1-codfw.mgmt.codfw.wmnet
  • 16:16 brett: Delete source packages for varnish in bullseye-wikimedia
  • 16:08 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1045.eqiad.wmnet
  • 16:08 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase1045.eqiad.wmnet
  • 16:08 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1044.eqiad.wmnet
  • 16:08 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase1044.eqiad.wmnet
  • 16:08 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1043.eqiad.wmnet
  • 16:08 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase1043.eqiad.wmnet
  • 15:58 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=eqiad,name=restbase1045.eqiad.wmnet
  • 15:58 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=eqiad,name=restbase1044.eqiad.wmnet
  • 15:58 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=eqiad,name=restbase1043.eqiad.wmnet
  • 15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:50 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-f1-codfw - pt1979@cumin2002"
  • 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:45 moritzm: installing twitter-bootstrap3 security updates
  • 15:45 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-f1-codfw - pt1979@cumin2002"
  • 15:40 brett: remove libvarnishapi2 from bullseye-wikimedia main
  • 15:39 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:39 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-f1-codfw.mgmt.codfw.wmnet
  • 15:38 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 15:38 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 15:38 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 15:37 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 15:37 lucaswerkmeister-wmde@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 15:36 brett: remove varnish libvmod-netmapper libvmod-querysort libvmod-re2 varnish-modules libvarnishapi2 varnishkafka from buster-wikimedia
  • 15:36 lucaswerkmeister-wmde@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 14:59 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 14:59 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 14:59 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 14:59 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 14:49 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P75449 and previous config saved to /var/cache/conftool/dbconfig/20250424-144923-root.json
  • 14:34 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P75447 and previous config saved to /var/cache/conftool/dbconfig/20250424-143417-root.json
  • 14:25 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Fix EntitySchema propertyType on Wikidata (T371196) (duration: 12m 11s)
  • 14:19 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P75446 and previous config saved to /var/cache/conftool/dbconfig/20250424-141911-root.json
  • 14:18 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Continuing with sync
  • 14:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-logging2005']
  • 14:17 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Backport for Fix EntitySchema propertyType on Wikidata (T371196) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:13 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Fix EntitySchema propertyType on Wikidata (T371196)
  • 14:11 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging2005']
  • 14:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['kafka-logging2005']
  • 14:11 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging2005']
  • 14:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-logging2005']
  • 14:04 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P75445 and previous config saved to /var/cache/conftool/dbconfig/20250424-140406-root.json
  • 13:59 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging2005']
  • 13:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-logging2005']
  • 13:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging2005']
  • 13:51 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging2005']
  • 13:49 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P75444 and previous config saved to /var/cache/conftool/dbconfig/20250424-134900-root.json
  • 13:43 taavi@dns3004: END - running authdns-update
  • 13:40 taavi@dns3004: START - running authdns-update
  • 13:33 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P75443 and previous config saved to /var/cache/conftool/dbconfig/20250424-133355-root.json
  • 13:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T391056)', diff saved to https://phabricator.wikimedia.org/P75442 and previous config saved to /var/cache/conftool/dbconfig/20250424-131928-fceratto.json
  • 13:18 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P75441 and previous config saved to /var/cache/conftool/dbconfig/20250424-131850-root.json
  • 13:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P75440 and previous config saved to /var/cache/conftool/dbconfig/20250424-130421-fceratto.json
  • 13:03 kart_: Updated cxserver to 2025-04-15-070132-production (T391289)
  • 13:03 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P75439 and previous config saved to /var/cache/conftool/dbconfig/20250424-130344-root.json
  • 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 12:58 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 12:55 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 12:54 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 12:53 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-fe1016.eqiad.wmnet with OS bullseye
  • 12:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P75436 and previous config saved to /var/cache/conftool/dbconfig/20250424-124914-fceratto.json
  • 12:49 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe1016.eqiad.wmnet with OS bullseye
  • 12:48 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P75435 and previous config saved to /var/cache/conftool/dbconfig/20250424-124838-root.json
  • 12:36 ladsgroup@deploy1003: Finished scap sync-world: Backport for Add support for x3 db cluster (T351820) (duration: 14m 28s)
  • 12:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T391056)', diff saved to https://phabricator.wikimedia.org/P75434 and previous config saved to /var/cache/conftool/dbconfig/20250424-123407-fceratto.json
  • 12:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 12:26 ladsgroup@deploy1003: ladsgroup: Backport for Add support for x3 db cluster (T351820) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:21 ladsgroup@deploy1003: Started scap sync-world: Backport for Add support for x3 db cluster (T351820)
  • 12:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T391056)', diff saved to https://phabricator.wikimedia.org/P75433 and previous config saved to /var/cache/conftool/dbconfig/20250424-121819-fceratto.json
  • 12:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 12:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T391056)', diff saved to https://phabricator.wikimedia.org/P75432 and previous config saved to /var/cache/conftool/dbconfig/20250424-121756-fceratto.json
  • 12:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2046.codfw.wmnet with OS bookworm
  • 12:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2047.codfw.wmnet with OS bookworm
  • 12:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2048.codfw.wmnet with OS bookworm
  • 12:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2032.codfw.wmnet with reason: Maintenance
  • 12:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2032.codfw.wmnet with reason: Maintenance
  • 12:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2032 T391921', diff saved to https://phabricator.wikimedia.org/P75431 and previous config saved to /var/cache/conftool/dbconfig/20250424-121152-marostegui.json
  • 12:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P75430 and previous config saved to /var/cache/conftool/dbconfig/20250424-120249-fceratto.json
  • 11:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P75429 and previous config saved to /var/cache/conftool/dbconfig/20250424-114742-fceratto.json
  • 11:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T391056)', diff saved to https://phabricator.wikimedia.org/P75428 and previous config saved to /var/cache/conftool/dbconfig/20250424-113234-fceratto.json
  • 11:16 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T391056)', diff saved to https://phabricator.wikimedia.org/P75427 and previous config saved to /var/cache/conftool/dbconfig/20250424-111625-fceratto.json
  • 11:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 11:11 moritzm: installing python-urllib3 security updates
  • 11:02 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 11:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T391056)', diff saved to https://phabricator.wikimedia.org/P75426 and previous config saved to /var/cache/conftool/dbconfig/20250424-110230-fceratto.json
  • 10:47 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) virt.cloudgw.eqiad1.wikimediacloud.org on all recursors
  • 10:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P75425 and previous config saved to /var/cache/conftool/dbconfig/20250424-104723-fceratto.json
  • 10:47 aborrero@cumin1002: START - Cookbook sre.dns.wipe-cache virt.cloudgw.eqiad1.wikimediacloud.org on all recursors
  • 10:47 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:47 aborrero@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw updates - aborrero@cumin1002"
  • 10:47 aborrero@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw updates - aborrero@cumin1002"
  • 10:33 aborrero@cumin1002: START - Cookbook sre.dns.netbox
  • 10:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P75424 and previous config saved to /var/cache/conftool/dbconfig/20250424-103217-fceratto.json
  • 10:29 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudinstances2b-gw.openstack.eqiad1.wikimediacloud.org on all recursors
  • 10:29 aborrero@cumin1002: START - Cookbook sre.dns.wipe-cache cloudinstances2b-gw.openstack.eqiad1.wikimediacloud.org on all recursors
  • 10:27 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:26 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:26 aborrero@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: neutron updates - aborrero@cumin1002"
  • 10:26 aborrero@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: neutron updates - aborrero@cumin1002"
  • 10:22 aborrero@cumin1002: START - Cookbook sre.dns.netbox
  • 10:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T391056)', diff saved to https://phabricator.wikimedia.org/P75423 and previous config saved to /var/cache/conftool/dbconfig/20250424-101710-fceratto.json
  • 10:04 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:02 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T391056)', diff saved to https://phabricator.wikimedia.org/P75422 and previous config saved to /var/cache/conftool/dbconfig/20250424-100206-fceratto.json
  • 10:01 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 10:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T391056)', diff saved to https://phabricator.wikimedia.org/P75421 and previous config saved to /var/cache/conftool/dbconfig/20250424-100143-fceratto.json
  • 09:59 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P75420 and previous config saved to /var/cache/conftool/dbconfig/20250424-094635-fceratto.json
  • 09:41 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:33 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P75419 and previous config saved to /var/cache/conftool/dbconfig/20250424-093128-fceratto.json
  • 09:27 Emperor: depool thanos-fe200[1-3] pending decommissioning T391352
  • 09:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T391056)', diff saved to https://phabricator.wikimedia.org/P75418 and previous config saved to /var/cache/conftool/dbconfig/20250424-091622-fceratto.json
  • 08:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T391056)', diff saved to https://phabricator.wikimedia.org/P75417 and previous config saved to /var/cache/conftool/dbconfig/20250424-085933-fceratto.json
  • 08:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 08:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T391056)', diff saved to https://phabricator.wikimedia.org/P75416 and previous config saved to /var/cache/conftool/dbconfig/20250424-085911-fceratto.json
  • 08:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P75415 and previous config saved to /var/cache/conftool/dbconfig/20250424-084404-fceratto.json
  • 08:40 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P75414 and previous config saved to /var/cache/conftool/dbconfig/20250424-084004-root.json
  • 08:39 moritzm: installing reprepro bugfix updates from Bookworm point release
  • 08:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P75413 and previous config saved to /var/cache/conftool/dbconfig/20250424-082857-fceratto.json
  • 08:24 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P75412 and previous config saved to /var/cache/conftool/dbconfig/20250424-082458-root.json
  • 08:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T391056)', diff saved to https://phabricator.wikimedia.org/P75411 and previous config saved to /var/cache/conftool/dbconfig/20250424-081350-fceratto.json
  • 08:09 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P75410 and previous config saved to /var/cache/conftool/dbconfig/20250424-080953-root.json
  • 07:55 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T391056)', diff saved to https://phabricator.wikimedia.org/P75409 and previous config saved to /var/cache/conftool/dbconfig/20250424-075547-fceratto.json
  • 07:55 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 07:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T391056)', diff saved to https://phabricator.wikimedia.org/P75408 and previous config saved to /var/cache/conftool/dbconfig/20250424-075524-fceratto.json
  • 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P75407 and previous config saved to /var/cache/conftool/dbconfig/20250424-075448-root.json
  • 07:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P75405 and previous config saved to /var/cache/conftool/dbconfig/20250424-074016-fceratto.json
  • 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P75404 and previous config saved to /var/cache/conftool/dbconfig/20250424-073943-root.json
  • 07:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P75403 and previous config saved to /var/cache/conftool/dbconfig/20250424-072508-fceratto.json
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P75402 and previous config saved to /var/cache/conftool/dbconfig/20250424-072439-root.json
  • 07:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T391056)', diff saved to https://phabricator.wikimedia.org/P75401 and previous config saved to /var/cache/conftool/dbconfig/20250424-071001-fceratto.json
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P75400 and previous config saved to /var/cache/conftool/dbconfig/20250424-070933-root.json
  • 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P75399 and previous config saved to /var/cache/conftool/dbconfig/20250424-065428-root.json
  • 06:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 06:52 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T391056)', diff saved to https://phabricator.wikimedia.org/P75398 and previous config saved to /var/cache/conftool/dbconfig/20250424-065227-fceratto.json
  • 06:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 06:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 06:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 06:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T391056)', diff saved to https://phabricator.wikimedia.org/P75397 and previous config saved to /var/cache/conftool/dbconfig/20250424-065149-fceratto.json
  • 06:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1257.eqiad.wmnet with reason: Maintenance
  • 06:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1256.eqiad.wmnet with reason: Maintenance
  • 06:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1255.eqiad.wmnet with reason: Maintenance
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P75396 and previous config saved to /var/cache/conftool/dbconfig/20250424-063922-root.json
  • 06:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P75395 and previous config saved to /var/cache/conftool/dbconfig/20250424-063643-fceratto.json
  • 06:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1027.eqiad.wmnet with reason: Maintenance
  • 06:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1027 T391921', diff saved to https://phabricator.wikimedia.org/P75394 and previous config saved to /var/cache/conftool/dbconfig/20250424-063345-marostegui.json
  • 06:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P75393 and previous config saved to /var/cache/conftool/dbconfig/20250424-062135-fceratto.json
  • 06:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T391056)', diff saved to https://phabricator.wikimedia.org/P75392 and previous config saved to /var/cache/conftool/dbconfig/20250424-060628-fceratto.json
  • 05:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T391056)', diff saved to https://phabricator.wikimedia.org/P75391 and previous config saved to /var/cache/conftool/dbconfig/20250424-054831-fceratto.json
  • 05:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 05:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T391056)', diff saved to https://phabricator.wikimedia.org/P75390 and previous config saved to /var/cache/conftool/dbconfig/20250424-054808-fceratto.json
  • 05:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P75389 and previous config saved to /var/cache/conftool/dbconfig/20250424-053301-fceratto.json
  • 05:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P75388 and previous config saved to /var/cache/conftool/dbconfig/20250424-051753-fceratto.json
  • 05:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T391056)', diff saved to https://phabricator.wikimedia.org/P75387 and previous config saved to /var/cache/conftool/dbconfig/20250424-050247-fceratto.json
  • 04:41 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T391056)', diff saved to https://phabricator.wikimedia.org/P75386 and previous config saved to /var/cache/conftool/dbconfig/20250424-044153-fceratto.json
  • 04:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 04:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T391056)', diff saved to https://phabricator.wikimedia.org/P75385 and previous config saved to /var/cache/conftool/dbconfig/20250424-044130-fceratto.json
  • 04:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P75384 and previous config saved to /var/cache/conftool/dbconfig/20250424-042623-fceratto.json
  • 04:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P75383 and previous config saved to /var/cache/conftool/dbconfig/20250424-041116-fceratto.json
  • 03:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T391056)', diff saved to https://phabricator.wikimedia.org/P75382 and previous config saved to /var/cache/conftool/dbconfig/20250424-035609-fceratto.json
  • 03:37 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T391056)', diff saved to https://phabricator.wikimedia.org/P75381 and previous config saved to /var/cache/conftool/dbconfig/20250424-033724-fceratto.json
  • 03:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 03:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T391056)', diff saved to https://phabricator.wikimedia.org/P75380 and previous config saved to /var/cache/conftool/dbconfig/20250424-033701-fceratto.json
  • 03:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P75379 and previous config saved to /var/cache/conftool/dbconfig/20250424-032154-fceratto.json
  • 03:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P75378 and previous config saved to /var/cache/conftool/dbconfig/20250424-030647-fceratto.json
  • 02:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T391056)', diff saved to https://phabricator.wikimedia.org/P75377 and previous config saved to /var/cache/conftool/dbconfig/20250424-025140-fceratto.json
  • 02:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T391056)', diff saved to https://phabricator.wikimedia.org/P75376 and previous config saved to /var/cache/conftool/dbconfig/20250424-023220-fceratto.json
  • 02:32 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 02:17 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 02:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 02:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T391056)', diff saved to https://phabricator.wikimedia.org/P75375 and previous config saved to /var/cache/conftool/dbconfig/20250424-020328-fceratto.json
  • 01:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P75374 and previous config saved to /var/cache/conftool/dbconfig/20250424-014821-fceratto.json
  • 01:40 pt1979@cumin2002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e3-codfw
  • 01:40 pt1979@cumin2002: START - Cookbook sre.network.tls for network device lsw1-e3-codfw
  • 01:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2049.codfw.wmnet with OS bookworm
  • 01:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:33 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-e3-codfw.mgmt.codfw.wmnet
  • 01:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P75373 and previous config saved to /var/cache/conftool/dbconfig/20250424-013313-fceratto.json
  • 01:28 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2045.codfw.wmnet with OS bookworm
  • 01:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2049.codfw.wmnet with reason: host reimage
  • 01:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T391056)', diff saved to https://phabricator.wikimedia.org/P75372 and previous config saved to /var/cache/conftool/dbconfig/20250424-011807-fceratto.json
  • 01:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2049.codfw.wmnet with reason: host reimage
  • 01:10 eileen: config revision changed from bfbce54f to 47a5d384
  • 01:02 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1251 (T391056)', diff saved to https://phabricator.wikimedia.org/P75371 and previous config saved to /var/cache/conftool/dbconfig/20250424-010217-fceratto.json
  • 01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-e3-codfw - pt1979@cumin2002"
  • 01:02 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1251.eqiad.wmnet with reason: Maintenance
  • 01:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2049.codfw.wmnet with OS bookworm
  • 01:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2048.codfw.wmnet with OS bookworm
  • 01:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2047.codfw.wmnet with OS bookworm
  • 01:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2046.codfw.wmnet with OS bookworm
  • 01:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2045.codfw.wmnet with OS bookworm
  • 00:59 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-e3-codfw - pt1979@cumin2002"
  • 00:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 00:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 00:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 00:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 00:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 00:54 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:54 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 00:54 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:54 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-e3-codfw.mgmt.codfw.wmnet
  • 00:53 pt1979@cumin2002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e1-codfw
  • 00:53 pt1979@cumin2002: START - Cookbook sre.network.tls for network device lsw1-e1-codfw
  • 00:52 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-e1-codfw.mgmt.codfw.wmnet
  • 00:46 eileen: civicrm upgraded from b3038510 to c8946ea5
  • 00:45 eileen: config revision changed from c635ed3c to bfbce54f
  • 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 00:44 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 00:43 pt1979@cumin2002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-codfw
  • 00:43 pt1979@cumin2002: START - Cookbook sre.network.tls for network device ssw1-e1-codfw
  • 00:32 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-e1-codfw.mgmt.codfw.wmnet
  • 00:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 00:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T391056)', diff saved to https://phabricator.wikimedia.org/P75370 and previous config saved to /var/cache/conftool/dbconfig/20250424-003043-fceratto.json
  • 00:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-e1-codfw - pt1979@cumin2002"
  • 00:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-e1-codfw - pt1979@cumin2002"
  • 00:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:17 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-e1-codfw.mgmt.codfw.wmnet
  • 00:17 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-e1-codfw.mgmt.codfw.wmnet
  • 00:17 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:17 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for lsw1-e1-codfw - pt1979@cumin2002"
  • 00:17 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for lsw1-e1-codfw - pt1979@cumin2002"
  • 00:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P75369 and previous config saved to /var/cache/conftool/dbconfig/20250424-001535-fceratto.json
  • 00:12 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P75368 and previous config saved to /var/cache/conftool/dbconfig/20250424-000028-fceratto.json

2025-04-23

  • 23:59 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:56 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 23:56 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-e1-codfw.mgmt.codfw.wmnet
  • 23:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T391056)', diff saved to https://phabricator.wikimedia.org/P75367 and previous config saved to /var/cache/conftool/dbconfig/20250423-234521-fceratto.json
  • 23:41 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:41 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-e1-codfw - pt1979@cumin2002"
  • 23:41 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-e1-codfw - pt1979@cumin2002"
  • 23:37 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 23:37 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-e1-codfw.mgmt.codfw.wmnet
  • 23:36 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-e1-codfw.mgmt.codfw.wmnet
  • 22:58 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:58 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-e1-codfw - pt1979@cumin2002"
  • 22:58 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-e1-codfw - pt1979@cumin2002"
  • 22:54 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: security release
  • 22:53 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:53 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-e1-codfw.mgmt.codfw.wmnet
  • 22:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-e1-codfw.mgmt.codfw.wmnet
  • 22:52 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:52 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-e1-codfw - pt1979@cumin2002"
  • 22:52 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-e1-codfw - pt1979@cumin2002"
  • 22:46 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2061.codfw.wmnet with OS bullseye
  • 22:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:34 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T391056)', diff saved to https://phabricator.wikimedia.org/P75366 and previous config saved to /var/cache/conftool/dbconfig/20250423-223359-fceratto.json
  • 22:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 22:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T391056)', diff saved to https://phabricator.wikimedia.org/P75365 and previous config saved to /var/cache/conftool/dbconfig/20250423-223336-fceratto.json
  • 22:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2061.codfw.wmnet with reason: host reimage
  • 22:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P75364 and previous config saved to /var/cache/conftool/dbconfig/20250423-221828-fceratto.json
  • 22:18 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2061.codfw.wmnet with reason: host reimage
  • 22:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2096.codfw.wmnet with OS bullseye
  • 22:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P75363 and previous config saved to /var/cache/conftool/dbconfig/20250423-220321-fceratto.json
  • 22:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2061
  • 22:03 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2061
  • 22:02 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2061
  • 22:02 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2061.codfw.wmnet 143.0.192.10.in-addr.arpa 3.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 22:02 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2061.codfw.wmnet 143.0.192.10.in-addr.arpa 3.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 22:02 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:02 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2061 - bking@cumin2002"
  • 22:02 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2061 - bking@cumin2002"
  • 21:58 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:57 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2061
  • 21:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2061.codfw.wmnet with OS bullseye
  • 21:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2061 to cirrussearch2061
  • 21:55 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2061
  • 21:54 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2061
  • 21:54 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:54 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2061 to cirrussearch2061 - bking@cumin2002"
  • 21:54 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2061 to cirrussearch2061 - bking@cumin2002"
  • 21:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2096.codfw.wmnet with reason: host reimage
  • 21:50 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2096.codfw.wmnet with reason: host reimage
  • 21:50 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:49 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2061 to cirrussearch2061
  • 21:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T391056)', diff saved to https://phabricator.wikimedia.org/P75362 and previous config saved to /var/cache/conftool/dbconfig/20250423-214814-fceratto.json
  • 21:37 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=eqiad,name=restbase1043.eqiad.wmnet
  • 21:36 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=eqiad,name=restbase1028.eqiad.wmnet
  • 21:32 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2096
  • 21:32 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2096
  • 21:32 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2096
  • 21:32 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2096.codfw.wmnet 233.16.192.10.in-addr.arpa 3.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 21:32 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2096.codfw.wmnet 233.16.192.10.in-addr.arpa 3.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 21:32 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:32 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2096 - bking@cumin2002"
  • 21:32 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2096 - bking@cumin2002"
  • 21:28 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:28 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T391056)', diff saved to https://phabricator.wikimedia.org/P75361 and previous config saved to /var/cache/conftool/dbconfig/20250423-212818-fceratto.json
  • 21:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 21:28 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2096
  • 21:28 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2096.codfw.wmnet with OS bullseye
  • 21:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T391056)', diff saved to https://phabricator.wikimedia.org/P75360 and previous config saved to /var/cache/conftool/dbconfig/20250423-212756-fceratto.json
  • 21:15 jforrester@deploy1003: Finished scap sync-world: Backport for [wikifunctionswiki] Enable Parsoid in wikitext articles, tests: Add a Wikifunctions-related test suite (duration: 11m 33s)
  • 21:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P75359 and previous config saved to /var/cache/conftool/dbconfig/20250423-211249-fceratto.json
  • 20:57 dancy@deploy1003: Installation of scap version "4.155.0" completed for 186 hosts
  • 20:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P75358 and previous config saved to /var/cache/conftool/dbconfig/20250423-205743-fceratto.json
  • 20:53 dancy@deploy1003: Installing scap version "4.155.0" for 186 host(s)
  • 20:49 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2096']
  • 20:46 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cirrussearch2096']
  • 20:45 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on aphlict2001.codfw.wmnet with reason: Bookworm Re-image
  • 20:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T391056)', diff saved to https://phabricator.wikimedia.org/P75357 and previous config saved to /var/cache/conftool/dbconfig/20250423-204236-fceratto.json
  • 20:41 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2096']
  • 20:38 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cirrussearch2096']
  • 20:33 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2096']
  • 20:22 cscott@deploy1003: Finished scap sync-world: Backport for Turn on ParsoidFragmentInput; remove unneeded ParsoidFragmentSupport config (T268144) (duration: 15m 19s)
  • 20:22 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@4a7644d]: Deploy hotfix for T391283. (duration: 01m 04s)
  • 20:21 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@4a7644d]: Deploy hotfix for T391283.
  • 20:15 cscott@deploy1003: cscott: Continuing with sync
  • 20:12 cscott@deploy1003: cscott: Backport for Turn on ParsoidFragmentInput; remove unneeded ParsoidFragmentSupport config (T268144) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:08 brett: import libvmod-netmapper-1.10-1 into bullseye-wikimedia (T392533)
  • 20:07 cscott@deploy1003: Started scap sync-world: Backport for Turn on ParsoidFragmentInput; remove unneeded ParsoidFragmentSupport config (T268144)
  • 20:04 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T391056)', diff saved to https://phabricator.wikimedia.org/P75356 and previous config saved to /var/cache/conftool/dbconfig/20250423-200358-fceratto.json
  • 20:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 20:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T391056)', diff saved to https://phabricator.wikimedia.org/P75355 and previous config saved to /var/cache/conftool/dbconfig/20250423-200346-fceratto.json
  • 19:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P75354 and previous config saved to /var/cache/conftool/dbconfig/20250423-194839-fceratto.json
  • 19:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P75353 and previous config saved to /var/cache/conftool/dbconfig/20250423-193332-fceratto.json
  • 19:31 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@7312379]: Release DAGs for T391283. (duration: 00m 54s)
  • 19:30 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@7312379]: Release DAGs for T391283.
  • 19:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T391056)', diff saved to https://phabricator.wikimedia.org/P75352 and previous config saved to /var/cache/conftool/dbconfig/20250423-191825-fceratto.json
  • 19:09 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release
  • 19:01 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T391056)', diff saved to https://phabricator.wikimedia.org/P75351 and previous config saved to /var/cache/conftool/dbconfig/20250423-190116-fceratto.json
  • 19:01 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 19:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T391056)', diff saved to https://phabricator.wikimedia.org/P75350 and previous config saved to /var/cache/conftool/dbconfig/20250423-190054-fceratto.json
  • 18:59 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:59 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-e1-codfw - pt1979@cumin2002"
  • 18:59 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-e1-codfw - pt1979@cumin2002"
  • 18:56 brett: import libvmod-wmfuniq-0.1.0~deb12u1 and wmfuniq-keygen-0.1.0~deb12u1 into bookworm-wikimedia (T392059)
  • 18:56 brett: import libvmod-wmfuniq-0.1.0~deb11u1 and wmfuniq-keygen-0.1.0~deb11u1 into bullseye-wikimedia (T392059)
  • 18:51 dzahn@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: security release
  • 18:51 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release
  • 18:50 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:50 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-e1-codfw.mgmt.codfw.wmnet
  • 18:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P75349 and previous config saved to /var/cache/conftool/dbconfig/20250423-184547-fceratto.json
  • 18:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-e1-codfw.mgmt.codfw.wmnet
  • 18:43 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-e1-codfw.mgmt.codfw.wmnet
  • 18:43 brett: remove libvmod-wmfuniq-0.1.0 and wmfuniq-keygen-0.1.0 from bullseye-wikimedia (T392059)
  • 18:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P75348 and previous config saved to /var/cache/conftool/dbconfig/20250423-183040-fceratto.json
  • 18:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T391056)', diff saved to https://phabricator.wikimedia.org/P75347 and previous config saved to /var/cache/conftool/dbconfig/20250423-181533-fceratto.json
  • 18:09 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2113.codfw.wmnet with OS bullseye
  • 18:04 brett: import libvmod-wmfuniq 0.1.0 into bullseye-wikimedia (T392059)
  • 17:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T391056)', diff saved to https://phabricator.wikimedia.org/P75346 and previous config saved to /var/cache/conftool/dbconfig/20250423-175948-fceratto.json
  • 17:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 17:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T391056)', diff saved to https://phabricator.wikimedia.org/P75345 and previous config saved to /var/cache/conftool/dbconfig/20250423-175926-fceratto.json
  • 17:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P75344 and previous config saved to /var/cache/conftool/dbconfig/20250423-174419-fceratto.json
  • 17:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:39 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch2071.codfw.wmnet|cirrussearch2098.codfw.wmnet|cirrussearch2099.codfw.wmnet|cirrussearch2101.codfw.wmnet|cirrussearch2102.codfw.wmnet|cirrussearch2113.codfw.wmnet
  • 17:38 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P75343 and previous config saved to /var/cache/conftool/dbconfig/20250423-172912-fceratto.json
  • 17:21 brett: Remove libvarnishapi-dev from bookworm-wikimedia
  • 17:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T391056)', diff saved to https://phabricator.wikimedia.org/P75342 and previous config saved to /var/cache/conftool/dbconfig/20250423-171404-fceratto.json
  • 17:04 brett: Remove varnish libvmod-re2 libvmod-netmapper libvmod-querysort libvarnishapi2 varnish-modules varnishkafka from bookworm-wikimedia
  • 16:56 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T391056)', diff saved to https://phabricator.wikimedia.org/P75341 and previous config saved to /var/cache/conftool/dbconfig/20250423-165634-fceratto.json
  • 16:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 16:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T391056)', diff saved to https://phabricator.wikimedia.org/P75340 and previous config saved to /var/cache/conftool/dbconfig/20250423-165611-fceratto.json
  • 16:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2113
  • 16:52 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2113
  • 16:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2113.codfw.wmnet with OS bullseye
  • 16:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2113 to cirrussearch2113
  • 16:50 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2113
  • 16:50 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2113
  • 16:50 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:50 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2113 to cirrussearch2113 - bking@cumin2002"
  • 16:47 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2113 to cirrussearch2113 - bking@cumin2002"
  • 16:43 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:42 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2113 to cirrussearch2113
  • 16:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P75338 and previous config saved to /var/cache/conftool/dbconfig/20250423-164105-fceratto.json
  • 16:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2102.codfw.wmnet with OS bullseye
  • 16:28 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:28 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:27 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Enable temporary-account-viewer group on all WMF production wikis (T390942 T387205) (duration: 11m 11s)
  • 16:27 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P75337 and previous config saved to /var/cache/conftool/dbconfig/20250423-162558-fceratto.json
  • 16:25 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 16:24 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P75336 and previous config saved to /var/cache/conftool/dbconfig/20250423-162434-root.json
  • 16:24 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 16:23 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1184
  • 16:23 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1184
  • 16:21 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:21 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 16:21 dreamyjazz@deploy1003: dreamyjazz: Backport for Enable temporary-account-viewer group on all WMF production wikis (T390942 T387205) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:19 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 16:18 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
  • 16:16 dreamyjazz@deploy1003: Started scap sync-world: Backport for Enable temporary-account-viewer group on all WMF production wikis (T390942 T387205)
  • 16:13 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 16:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T391056)', diff saved to https://phabricator.wikimedia.org/P75335 and previous config saved to /var/cache/conftool/dbconfig/20250423-161051-fceratto.json
  • 16:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2101.codfw.wmnet with OS bullseye
  • 16:10 dreamyjazz@deploy1003: dreamyjazz: Backport for Enable temporary-account-viewer group on all WMF production wikis (T390942 T387205) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2102.codfw.wmnet with reason: host reimage
  • 16:09 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
  • 16:09 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P75334 and previous config saved to /var/cache/conftool/dbconfig/20250423-160928-root.json
  • 16:08 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
  • 16:07 dreamyjazz@deploy1003: Started scap sync-world: Backport for Enable temporary-account-viewer group on all WMF production wikis (T390942 T387205)
  • 16:06 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2102.codfw.wmnet with reason: host reimage
  • 16:04 vgutierrez: restarting pybal on lvs201[34]
  • 16:01 dancy@deploy1003: Installation of scap version "4.154.0" completed for 2 hosts
  • 16:00 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
  • 16:00 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 16:00 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 16:00 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 16:00 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 15:59 dancy@deploy1003: Installing scap version "4.154.0" for 2 host(s)
  • 15:58 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
  • 15:54 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P75333 and previous config saved to /var/cache/conftool/dbconfig/20250423-155423-root.json
  • 15:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T391056)', diff saved to https://phabricator.wikimedia.org/P75332 and previous config saved to /var/cache/conftool/dbconfig/20250423-155423-fceratto.json
  • 15:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 15:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T391056)', diff saved to https://phabricator.wikimedia.org/P75331 and previous config saved to /var/cache/conftool/dbconfig/20250423-155401-fceratto.json
  • 15:52 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
  • 15:50 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2102
  • 15:50 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2102
  • 15:48 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2102
  • 15:48 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2102.codfw.wmnet 221.32.192.10.in-addr.arpa 1.2.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:48 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2102.codfw.wmnet 221.32.192.10.in-addr.arpa 1.2.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:48 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:48 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2102 - bking@cumin2002"
  • 15:48 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2102 - bking@cumin2002"
  • 15:47 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2101.codfw.wmnet with reason: host reimage
  • 15:44 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 15:43 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2102
  • 15:43 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2102.codfw.wmnet with OS bullseye
  • 15:43 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2101.codfw.wmnet with reason: host reimage
  • 15:41 ladsgroup@dns1004: END - running authdns-update
  • 15:39 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P75327 and previous config saved to /var/cache/conftool/dbconfig/20250423-153918-root.json
  • 15:39 ladsgroup@dns1004: START - running authdns-update
  • 15:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P75326 and previous config saved to /var/cache/conftool/dbconfig/20250423-153854-fceratto.json
  • 15:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2102 to cirrussearch2102
  • 15:29 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2102
  • 15:28 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2102
  • 15:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:28 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2102 to cirrussearch2102 - bking@cumin2002"
  • 15:28 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2102 to cirrussearch2102 - bking@cumin2002"
  • 15:26 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2101
  • 15:26 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2101
  • 15:25 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2101
  • 15:25 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2101.codfw.wmnet 220.32.192.10.in-addr.arpa 0.2.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:25 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2101.codfw.wmnet 220.32.192.10.in-addr.arpa 0.2.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:25 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:25 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2101 - bking@cumin2002"
  • 15:25 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2101 - bking@cumin2002"
  • 15:24 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P75324 and previous config saved to /var/cache/conftool/dbconfig/20250423-152412-root.json
  • 15:24 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 15:23 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2102 to cirrussearch2102
  • 15:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P75323 and previous config saved to /var/cache/conftool/dbconfig/20250423-152347-fceratto.json
  • 15:09 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwdebug1001.eqiad.wmnet with OS bullseye
  • 15:09 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P75322 and previous config saved to /var/cache/conftool/dbconfig/20250423-150907-root.json
  • 15:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T391056)', diff saved to https://phabricator.wikimedia.org/P75321 and previous config saved to /var/cache/conftool/dbconfig/20250423-150839-fceratto.json
  • 14:55 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:54 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2101
  • 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2101.codfw.wmnet with OS bullseye
  • 14:54 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2101.codfw.wmnet on all recursors
  • 14:54 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2101.codfw.wmnet on all recursors
  • 14:54 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P75320 and previous config saved to /var/cache/conftool/dbconfig/20250423-145401-root.json
  • 14:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2101 to cirrussearch2101
  • 14:52 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2101
  • 14:51 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2101
  • 14:51 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:51 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2101 to cirrussearch2101 - bking@cumin2002"
  • 14:48 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Remove wgCheckUserCentralIndexRangesToExclude definition (T389055) (duration: 11m 00s)
  • 14:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T391056)', diff saved to https://phabricator.wikimedia.org/P75319 and previous config saved to /var/cache/conftool/dbconfig/20250423-144811-fceratto.json
  • 14:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 14:47 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2101 to cirrussearch2101 - bking@cumin2002"
  • 14:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T391056)', diff saved to https://phabricator.wikimedia.org/P75318 and previous config saved to /var/cache/conftool/dbconfig/20250423-144741-fceratto.json
  • 14:42 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 14:42 dreamyjazz@deploy1003: dreamyjazz: Backport for Remove wgCheckUserCentralIndexRangesToExclude definition (T389055) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:40 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2099.codfw.wmnet with OS bullseye
  • 14:38 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P75317 and previous config saved to /var/cache/conftool/dbconfig/20250423-143856-root.json
  • 14:37 dreamyjazz@deploy1003: Started scap sync-world: Backport for Remove wgCheckUserCentralIndexRangesToExclude definition (T389055)
  • 14:37 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwdebug1001.eqiad.wmnet with reason: host reimage
  • 14:33 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mwdebug1001.eqiad.wmnet with reason: host reimage
  • 14:33 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:32 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2101 to cirrussearch2101
  • 14:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P75316 and previous config saved to /var/cache/conftool/dbconfig/20250423-143235-fceratto.json
  • 14:30 jforrester@deploy1003: Finished scap sync-world: Backport for ZString: Don't explode if we're handed an array with odd contents (T392370), API: Don't try to read fetchAllZLanguageCodes() in client-mode Action APIs either (T392014) (duration: 11m 29s)
  • 14:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2098.codfw.wmnet with OS bullseye
  • 14:23 jforrester@deploy1003: jforrester: Continuing with sync
  • 14:23 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P75315 and previous config saved to /var/cache/conftool/dbconfig/20250423-142350-root.json
  • 14:23 jforrester@deploy1003: jforrester: Backport for ZString: Don't explode if we're handed an array with odd contents (T392370), API: Don't try to read fetchAllZLanguageCodes() in client-mode Action APIs either (T392014) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2099.codfw.wmnet with reason: host reimage
  • 14:19 jforrester@deploy1003: Started scap sync-world: Backport for ZString: Don't explode if we're handed an array with odd contents (T392370), API: Don't try to read fetchAllZLanguageCodes() in client-mode Action APIs either (T392014)
  • 14:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P75314 and previous config saved to /var/cache/conftool/dbconfig/20250423-141728-fceratto.json
  • 14:15 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2099.codfw.wmnet with reason: host reimage
  • 14:14 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mwdebug1001.eqiad.wmnet with OS bullseye
  • 14:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1032.eqiad.wmnet with reason: Maintenance
  • 14:12 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:12 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:12 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1032', diff saved to https://phabricator.wikimedia.org/P75313 and previous config saved to /var/cache/conftool/dbconfig/20250423-141000-marostegui.json
  • 14:07 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:07 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:06 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:06 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:05 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:03 samtar@deploy1003: Finished scap sync-world: Backport for Remove temporary '-php8' and '-k8s' suffixes from ArcLamp pipeline (T391516) (duration: 13m 36s)
  • 14:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T391056)', diff saved to https://phabricator.wikimedia.org/P75312 and previous config saved to /var/cache/conftool/dbconfig/20250423-140221-fceratto.json
  • 13:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2099
  • 13:59 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2099
  • 13:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2099
  • 13:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2099.codfw.wmnet 218.32.192.10.in-addr.arpa 8.1.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2099.codfw.wmnet 218.32.192.10.in-addr.arpa 8.1.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2099 - bking@cumin2002"
  • 13:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2099 - bking@cumin2002"
  • 13:57 jiji@cumin1002: conftool action : set/pooled=inactive; selector: name=mwdebug1001.eqiad.wmnet
  • 13:56 samtar@deploy1003: ori, samtar: Continuing with sync
  • 13:56 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:56 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2098.codfw.wmnet with reason: host reimage
  • 13:54 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 13:54 samtar@deploy1003: ori, samtar: Backport for Remove temporary '-php8' and '-k8s' suffixes from ArcLamp pipeline (T391516) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2099
  • 13:53 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2099.codfw.wmnet with OS bullseye
  • 13:53 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2098.codfw.wmnet with reason: host reimage
  • 13:53 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2099.codfw.wmnet on all recursors
  • 13:53 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2099.codfw.wmnet on all recursors
  • 13:49 samtar@deploy1003: Started scap sync-world: Backport for Remove temporary '-php8' and '-k8s' suffixes from ArcLamp pipeline (T391516)
  • 13:47 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:44 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2099 to cirrussearch2099
  • 13:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2099
  • 13:43 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2099
  • 13:43 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:43 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2099 to cirrussearch2099 - bking@cumin2002"
  • 13:43 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2099 to cirrussearch2099 - bking@cumin2002"
  • 13:41 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T391056)', diff saved to https://phabricator.wikimedia.org/P75310 and previous config saved to /var/cache/conftool/dbconfig/20250423-134142-fceratto.json
  • 13:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 13:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T391056)', diff saved to https://phabricator.wikimedia.org/P75309 and previous config saved to /var/cache/conftool/dbconfig/20250423-134131-fceratto.json
  • 13:39 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 13:38 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2099 to cirrussearch2099
  • 13:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2098
  • 13:36 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2098
  • 13:36 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2098
  • 13:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2098.codfw.wmnet 217.32.192.10.in-addr.arpa 7.1.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:36 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2098.codfw.wmnet 217.32.192.10.in-addr.arpa 7.1.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:36 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2098 - bking@cumin2002"
  • 13:36 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2098 - bking@cumin2002"
  • 13:34 tgr_: T392462 Ran fixStuckGlobalRename.php for two users
  • 13:31 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 13:31 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2098
  • 13:31 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2098.codfw.wmnet with OS bullseye
  • 13:31 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2098.codfw.wmnet on all recursors
  • 13:31 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2098.codfw.wmnet on all recursors
  • 13:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2098 to cirrussearch2098
  • 13:29 kamila@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 13:29 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 13:29 kamila@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 13:29 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2098
  • 13:28 kamila@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 13:28 samtar@deploy1003: Finished scap sync-world: Backport for Add throttle exemptions for some Edit-a-thons (T391764 T391999) (duration: 11m 42s)
  • 13:28 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2098
  • 13:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:28 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2098 to cirrussearch2098 - bking@cumin2002"
  • 13:26 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2098 to cirrussearch2098 - bking@cumin2002"
  • 13:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P75308 and previous config saved to /var/cache/conftool/dbconfig/20250423-132624-fceratto.json
  • 13:23 moritzm: installing Linux 6.1.133 on Bookworm hosts
  • 13:22 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 13:22 samtar@deploy1003: superpes, samtar: Continuing with sync
  • 13:22 samtar@deploy1003: superpes, samtar: Backport for Add throttle exemptions for some Edit-a-thons (T391764 T391999) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:21 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2098 to cirrussearch2098
  • 13:18 TheresNoTime: samtar@deploy1003 Finished scap sync-world: Backport for SUL3: Remove unused CentralAuthSharedDomainPrefix config setting, Simplify CentralAuthEnableSul3 config setting value (duration: 11m 28s)
  • 13:05 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1003.eqiad.wmnet
  • 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
  • 13:03 samtar@deploy1003: Started scap sync-world: Backport for SUL3: Remove unused CentralAuthSharedDomainPrefix config setting, Simplify CentralAuthEnableSul3 config setting value
  • 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1002.eqiad.wmnet
  • 13:02 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-eqiad
  • 13:01 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
  • 12:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T391056)', diff saved to https://phabricator.wikimedia.org/P75306 and previous config saved to /var/cache/conftool/dbconfig/20250423-125611-fceratto.json
  • 12:54 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-codfw
  • 12:44 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:44 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:42 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1004.eqiad.wmnet
  • 12:41 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 12:36 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T391056)', diff saved to https://phabricator.wikimedia.org/P75305 and previous config saved to /var/cache/conftool/dbconfig/20250423-123640-fceratto.json
  • 12:36 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 12:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T391056)', diff saved to https://phabricator.wikimedia.org/P75304 and previous config saved to /var/cache/conftool/dbconfig/20250423-123617-fceratto.json
  • 12:36 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:35 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:35 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1004.eqiad.wmnet
  • 12:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc7 T391454', diff saved to https://phabricator.wikimedia.org/P75303 and previous config saved to /var/cache/conftool/dbconfig/20250423-122924-marostegui.json
  • 12:27 hashar: gerrit: removed obsolete 1024px-Sea_and_sky_light.cache.jpg file from all servers. File was replaced by 2006-12-28_10h26_33.jpg # T392479
  • 12:26 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
  • 12:21 cmooney@dns2005: END - running authdns-update
  • 12:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P75302 and previous config saved to /var/cache/conftool/dbconfig/20250423-122110-fceratto.json
  • 12:19 cmooney@dns2005: START - running authdns-update
  • 12:19 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 12:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2017.codfw.wmnet,pc1017.eqiad.wmnet with reason: Maintenance
  • 12:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc7 T391454', diff saved to https://phabricator.wikimedia.org/P75301 and previous config saved to /var/cache/conftool/dbconfig/20250423-121722-marostegui.json
  • 12:17 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
  • 12:10 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 12:10 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 12:09 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 12:09 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 12:08 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:08 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P75298 and previous config saved to /var/cache/conftool/dbconfig/20250423-120602-fceratto.json
  • 11:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T391056)', diff saved to https://phabricator.wikimedia.org/P75297 and previous config saved to /var/cache/conftool/dbconfig/20250423-115054-fceratto.json
  • 11:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T391056)', diff saved to https://phabricator.wikimedia.org/P75296 and previous config saved to /var/cache/conftool/dbconfig/20250423-113200-fceratto.json
  • 11:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 11:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T391056)', diff saved to https://phabricator.wikimedia.org/P75295 and previous config saved to /var/cache/conftool/dbconfig/20250423-113148-fceratto.json
  • 11:27 moritzm: installing libxml2 security updates
  • 11:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P75294 and previous config saved to /var/cache/conftool/dbconfig/20250423-111641-fceratto.json
  • 11:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P75293 and previous config saved to /var/cache/conftool/dbconfig/20250423-110134-fceratto.json
  • 10:53 moritzm: installing php8.2 security updates
  • 10:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T391056)', diff saved to https://phabricator.wikimedia.org/P75292 and previous config saved to /var/cache/conftool/dbconfig/20250423-104627-fceratto.json
  • 10:42 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 10:41 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 10:41 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 10:41 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 10:41 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 10:40 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 10:40 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 10:40 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 10:39 hnowlan: migrating various minor mobileapps/PCS APIs to serve via the rest-gateway instead of restbase
  • 10:27 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T391056)', diff saved to https://phabricator.wikimedia.org/P75291 and previous config saved to /var/cache/conftool/dbconfig/20250423-102752-fceratto.json
  • 10:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 10:06 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1004.eqiad.wmnet
  • 09:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 3.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.3.0.e.f.0.0.0.a.0.8.c.e.2.0.a.2.ip6.arpa on all recursors
  • 09:57 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache 3.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.3.0.e.f.0.0.0.a.0.8.c.e.2.0.a.2.ip6.arpa on all recursors
  • 09:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.3.0.e.f.0.0.0.a.0.8.c.e.2.0.a.2.ip6.arpa on all recursors
  • 09:57 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache 2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.3.0.e.f.0.0.0.a.0.8.c.e.2.0.a.2.ip6.arpa on all recursors
  • 09:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:56 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: correct dns record for cloudgw vip eqiad - cmooney@cumin1002"
  • 09:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: correct dns record for cloudgw vip eqiad - cmooney@cumin1002"
  • 09:52 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 09:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 09:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 09:29 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 09:10 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1004.eqiad.wmnet
  • 09:04 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1004.eqiad.wmnet
  • 08:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2001.codfw.wmnet
  • 08:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2001.codfw.wmnet
  • 08:18 moritzm: installing openjpeg2 security updates
  • 08:02 taavi@deploy1003: Finished scap sync-world: Backport for Add WMCS v6 range to relevant exclusions (T386689) (duration: 11m 58s)
  • 07:56 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
  • 07:56 taavi@deploy1003: taavi: Continuing with sync
  • 07:55 taavi@deploy1003: taavi: Backport for Add WMCS v6 range to relevant exclusions (T386689) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:52 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
  • 07:51 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
  • 07:50 taavi@deploy1003: Started scap sync-world: Backport for Add WMCS v6 range to relevant exclusions (T386689)
  • 07:46 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
  • 07:41 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
  • 07:36 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
  • 07:33 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
  • 07:28 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
  • 07:28 elukey: reboot ml-serve-ctrl* VMs to pick up new cpu/memory settings - T392289
  • 07:27 elukey: elukey@ganeti1048:~$ sudo gnt-instance modify -B memory=6g,vcpus=4 ml-serve-ctrl1001.eqiad.wmnet - T392289
  • 07:27 elukey: elukey@ganeti1048:~$ sudo gnt-instance modify -B memory=6g,vcpus=4 ml-serve-ctrl1002.eqiad.wmnet - T392289
  • 07:27 elukey: elukey@ganeti2032:~$ sudo gnt-instance modify -B memory=6g,vcpus=4 ml-serve-ctrl2002.codfw.wmnet - T392289
  • 07:26 elukey: elukey@ganeti2032:~$ sudo gnt-instance modify -B memory=6g,vcpus=4 ml-serve-ctrl2001.codfw.wmnet - T392289
  • 07:24 kartik@deploy1003: Finished scap sync-world: Backport for Add channel for ContentTranslation logging (T391311) (duration: 16m 53s)
  • 07:19 moritzm: installing libapache2-mod-auth-openidc security updates
  • 07:17 kartik@deploy1003: abi, kartik: Continuing with sync
  • 07:12 kartik@deploy1003: abi, kartik: Backport for Add channel for ContentTranslation logging (T391311) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:07 kartik@deploy1003: Started scap sync-world: Backport for Add channel for ContentTranslation logging (T391311)
  • 06:31 moritzm: installing erlang security updates

2025-04-22

  • 23:56 TimStarling: running cleanupBlocks.php on all wikis T389301
  • 04:02 mwpresync@deploy1003: Pruned MediaWiki: 1.44.0-wmf.23 (duration: 02m 48s)
  • 00:25 reedy@deploy1003: Synchronized wmf-config/InitialiseSettings-labs.php: Fix syntax (duration: 10m 53s)
  • 00:03 reedy@deploy1003: Synchronized wmf-config/InitialiseSettings-labs.php: Fix syntax (duration: 11m 02s)

2025-04-21

  • 23:12 cstone: civicrm upgraded from d7eefbc4 to b3038510
  • 22:44 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (6 nodes at a time) for ElasticSearch cluster search_codfw: test manual mode - ryankemper@cumin2002 - T388610
  • 22:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2071.codfw.wmnet with OS bullseye
  • 21:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2071.codfw.wmnet with reason: host reimage
  • 21:51 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2071.codfw.wmnet with reason: host reimage
  • 21:38 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (6 nodes at a time) for ElasticSearch cluster search_codfw: test manual mode - ryankemper@cumin2002 - T388610
  • 21:37 ryankemper@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (6 nodes at a time) for ElasticSearch cluster search_codfw: test manual mode - ryankemper@cumin2002 - T388610
  • 21:37 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (6 nodes at a time) for ElasticSearch cluster search_codfw: test manual mode - ryankemper@cumin2002 - T388610
  • 21:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2071
  • 21:36 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2071
  • 21:36 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2071
  • 21:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2071.codfw.wmnet 70.32.192.10.in-addr.arpa 0.7.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 21:35 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2071.codfw.wmnet 70.32.192.10.in-addr.arpa 0.7.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 21:35 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:35 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2071 - bking@cumin2002"
  • 21:35 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2071 - bking@cumin2002"
  • 21:31 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:31 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2071
  • 21:31 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2071.codfw.wmnet with OS bullseye
  • 21:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2071 to cirrussearch2071
  • 21:28 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2071
  • 21:28 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2071
  • 21:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:28 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2071 to cirrussearch2071 - bking@cumin2002"
  • 21:27 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2071 to cirrussearch2071 - bking@cumin2002"
  • 21:23 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:23 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2071 to cirrussearch2071
  • 21:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2096 to cirrussearch2096
  • 21:14 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2096
  • 21:14 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2096
  • 21:14 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:14 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2096 to cirrussearch2096 - bking@cumin2002"
  • 21:08 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2096 to cirrussearch2096 - bking@cumin2002"
  • 21:03 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:03 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2096 to cirrussearch2096
  • 20:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2110.codfw.wmnet with OS bullseye
  • 20:40 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2095.codfw.wmnet with OS bullseye
  • 20:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2110.codfw.wmnet with reason: host reimage
  • 20:36 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2110.codfw.wmnet with reason: host reimage
  • 20:29 jdrewniak@deploy1003: Finished scap sync-world: Backport for Design Research Participant Survey: Deploy (T392325), Enable reading list beta feature for beta cluster (T390881), Create EventStream configuration for PES1.3 Wikirun Game (duration: 18m 44s)
  • 20:22 jdrewniak@deploy1003: jdrewniak, dani, bwang: Continuing with sync
  • 20:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2095.codfw.wmnet with reason: host reimage
  • 20:20 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2110.codfw.wmnet with OS bullseye
  • 20:17 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2095.codfw.wmnet with reason: host reimage
  • 20:16 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2110 to cirrussearch2110
  • 20:15 jdrewniak@deploy1003: jdrewniak, dani, bwang: Backport for Design Research Participant Survey: Deploy (T392325), Enable reading list beta feature for beta cluster (T390881), Create EventStream configuration for PES1.3 Wikirun Game synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:15 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2110
  • 20:15 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2110
  • 20:14 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:14 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2110 to cirrussearch2110 - bking@cumin2002"
  • 20:14 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2110 to cirrussearch2110 - bking@cumin2002"
  • 20:11 jdrewniak@deploy1003: Started scap sync-world: Backport for Design Research Participant Survey: Deploy (T392325), Enable reading list beta feature for beta cluster (T390881), Create EventStream configuration for PES1.3 Wikirun Game
  • 20:10 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 20:09 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2110 to cirrussearch2110
  • 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2095
  • 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2095
  • 20:00 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2095
  • 20:00 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2095.codfw.wmnet 232.16.192.10.in-addr.arpa 2.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 19:59 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2095.codfw.wmnet 232.16.192.10.in-addr.arpa 2.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 19:59 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:59 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2095 - bking@cumin2002"
  • 19:59 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2095 - bking@cumin2002"
  • 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2095
  • 19:53 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2095.codfw.wmnet with OS bullseye
  • 19:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2095 to cirrussearch2095
  • 19:52 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2095
  • 19:51 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2095
  • 19:51 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:51 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2095 to cirrussearch2095 - bking@cumin2002"
  • 19:51 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2095 to cirrussearch2095 - bking@cumin2002"
  • 19:47 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 19:47 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2095 to cirrussearch2095
  • 18:22 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2094.codfw.wmnet with OS bullseye
  • 18:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2050.codfw.wmnet with OS bookworm
  • 18:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2094.codfw.wmnet with reason: host reimage
  • 17:59 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2094.codfw.wmnet with reason: host reimage
  • 17:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:42 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2094
  • 17:42 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2094
  • 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2094
  • 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2094.codfw.wmnet 230.16.192.10.in-addr.arpa 0.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2094.codfw.wmnet 230.16.192.10.in-addr.arpa 0.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2094 - bking@cumin2002"
  • 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2094 - bking@cumin2002"
  • 17:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2050.codfw.wmnet with reason: host reimage
  • 17:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2050.codfw.wmnet with reason: host reimage
  • 17:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2050.codfw.wmnet with OS bookworm
  • 17:24 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 17:24 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2094
  • 17:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2094.codfw.wmnet with OS bullseye
  • 17:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2094 to cirrussearch2094
  • 17:23 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2094
  • 17:23 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2094
  • 17:23 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:22 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2094 to cirrussearch2094 - bking@cumin2002"
  • 17:22 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2094 to cirrussearch2094 - bking@cumin2002"
  • 17:18 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 17:18 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2094 to cirrussearch2094
  • 16:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2064.codfw.wmnet with OS bullseye
  • 16:13 urandom: decommissioning Cassandra/restbase1030-{a,b,c} — T389423
  • 16:12 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2064.codfw.wmnet with reason: host reimage
  • 16:11 eevans@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1030.eqiad.wmnet with reason: Decommissioning — T378725
  • 16:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2064.codfw.wmnet with reason: host reimage
  • 16:03 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch2103.codfw.wmnet|cirrussearch2104.codfw.wmnet|cirrussearch2105.codfw.wmnet|cirrussearch2107.codfw.wmnet|cirrussearch2109.codfw.wmnet|cirrussearch2111.codfw.wmnet|cirrussearch2112.codfw.wmnet|cirrussearch2114.codfw.wmnet|cirrussearch2115.codfw.wmnet
  • 16:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2097\.codfw\.wmnet
  • 16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2091\.codfw\.wmnet
  • 16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2090\.codfw\.wmnet
  • 16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2089\.codfw\.wmnet
  • 16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2088\.codfw\.wmnet
  • 16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2087\.codfw\.wmnet
  • 16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2085\.codfw\.wmnet
  • 16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2082\.codfw\.wmnet
  • 16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2079\.codfw\.wmnet
  • 16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2077\.codfw\.wmnet
  • 16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2075\.codfw\.wmnet
  • 16:00 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2074\.codfw\.wmnet
  • 15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2072\.codfw\.wmnet
  • 15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2070\.codfw\.wmnet
  • 15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2069\.codfw\.wmnet
  • 15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2068\.codfw\.wmnet
  • 15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2067\.codfw\.wmnet
  • 15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2066\.codfw\.wmnet
  • 15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2065\.codfw\.wmnet
  • 15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2063\.codfw\.wmnet
  • 15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2062\.codfw\.wmnet
  • 15:59 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2060\.codfw\.wmnet
  • 15:58 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2059\.codfw\.wmnet
  • 15:58 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2058\.codfw\.wmnet
  • 15:58 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2057\.codfw\.wmnet
  • 15:58 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2056\.codfw\.wmnet
  • 15:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2064
  • 15:53 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2064
  • 15:53 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2064
  • 15:53 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2064.codfw.wmnet 109.16.192.10.in-addr.arpa 9.0.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:53 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2064.codfw.wmnet 109.16.192.10.in-addr.arpa 9.0.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:53 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:53 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2064 - bking@cumin2002"
  • 15:53 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2064 - bking@cumin2002"
  • 15:52 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2057\.codfw\.wmnet
  • 15:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:48 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 15:47 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2064
  • 15:47 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2064.codfw.wmnet with OS bullseye
  • 15:47 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2045.codfw.wmnet with OS bookworm
  • 15:45 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2057
  • 15:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2064 to cirrussearch2064
  • 15:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2064
  • 15:43 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2064
  • 15:43 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:43 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2064 to cirrussearch2064 - bking@cumin2002"
  • 15:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2064 to cirrussearch2064 - bking@cumin2002"
  • 15:42 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2004-dev.codfw.wmnet with OS bullseye
  • 15:38 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 15:37 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2064 to cirrussearch2064
  • 15:23 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2004-dev.codfw.wmnet with reason: host reimage
  • 15:21 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2004-dev.codfw.wmnet with reason: host reimage
  • 15:09 bking@cumin2002: conftool action : set/pooled=yes; selector: name=elastic2078\.codfw\.wmnet
  • 15:03 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd2004-dev.codfw.wmnet with OS bullseye
  • 14:57 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2004-dev.codfw.wmnet with OS bullseye
  • 14:56 bking@cumin2002: conftool action : set/pooled=no; selector: name=elastic2078\.codfw\.wmnet
  • 14:49 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2004-dev.codfw.wmnet with reason: host reimage
  • 14:46 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2004-dev.codfw.wmnet with reason: host reimage
  • 14:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2057.codfw.wmnet with OS bullseye
  • 14:28 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd2004-dev.codfw.wmnet with OS bullseye
  • 14:19 taavi@deploy1003: Finished scap sync-world: Backport for Design Research Participant Survey: Pre-deploy (T392325) (duration: 14m 53s)
  • 14:12 taavi@deploy1003: taavi, dani: Continuing with sync
  • 14:09 taavi@deploy1003: taavi, dani: Backport for Design Research Participant Survey: Pre-deploy (T392325) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:04 taavi@deploy1003: Started scap sync-world: Backport for Design Research Participant Survey: Pre-deploy (T392325)
  • 14:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2057.codfw.wmnet with reason: host reimage
  • 14:00 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2057.codfw.wmnet with reason: host reimage
  • 13:53 taavi: taavi@deploy1003 ~ $ echo "https://en.wikipedia.org/static/images/mobile/copyright/wikimaniawiki-wordmark.svg" | mwscript-k8s --attach purgeList.php -- --wiki enwiki
  • 13:49 taavi@deploy1003: Finished scap sync-world: Backport for wikimaniawiki: update logo to 2025 (T392239), Enable mobile sitenotice for shwiki (T392334) (duration: 41m 04s)
  • 13:44 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2057
  • 13:44 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2057
  • 13:43 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2057
  • 13:43 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2057.codfw.wmnet 204.16.192.10.in-addr.arpa 4.0.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:43 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2057.codfw.wmnet 204.16.192.10.in-addr.arpa 4.0.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:43 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:43 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2057 - bking@cumin2002"
  • 13:43 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2057 - bking@cumin2002"
  • 13:40 taavi@deploy1003: robertsky, taavi, aleksandar: Continuing with sync
  • 13:39 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 13:38 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2057
  • 13:38 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2057.codfw.wmnet with OS bullseye
  • 13:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2057 to cirrussearch2057
  • 13:37 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2057
  • 13:36 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2057
  • 13:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:36 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2057 to cirrussearch2057 - bking@cumin2002"
  • 13:36 taavi@deploy1003: robertsky, taavi, aleksandar: Backport for wikimaniawiki: update logo to 2025 (T392239), Enable mobile sitenotice for shwiki (T392334) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:36 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2057 to cirrussearch2057 - bking@cumin2002"
  • 13:30 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 13:29 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2057 to cirrussearch2057
  • 13:08 taavi@deploy1003: Started scap sync-world: Backport for wikimaniawiki: update logo to 2025 (T392239), Enable mobile sitenotice for shwiki (T392334)

2025-04-19

  • 16:48 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-ctrl1002.eqiad.wmnet
  • 16:44 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl1002.eqiad.wmnet
  • 16:44 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-ctrl1003.eqiad.wmnet
  • 16:40 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl1003.eqiad.wmnet
  • 16:40 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-ctrl2003.codfw.wmnet
  • 16:38 elukey: `sudo gnt-instance modify -B memory=6g,vcpus=4 aux-k8s-ctrl1002.eqiad.wmnet` - T392289
  • 16:38 elukey: `sudo gnt-instance modify -B memory=6g,vcpus=4 aux-k8s-ctrl1003.eqiad.wmnet` - T392289
  • 16:36 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl2003.codfw.wmnet
  • 16:35 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-ctrl2002.codfw.wmnet
  • 16:32 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl2002.codfw.wmnet
  • 16:30 elukey: `sudo gnt-instance modify -B memory=6g,vcpus=4 aux-k8s-ctrl2002.codfw.wmnet` - T392289
  • 16:30 elukey: `sudo gnt-instance modify -B memory=6g,vcpus=4 aux-k8s-ctrl2003.codfw.wmnet` - T392289
  • 00:26 urandom: decommissioning Cassandra/restbase1029-{a,b,c} — T389423
  • 00:24 eevans@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1029.eqiad.wmnet with reason: Decommissioning — T389423
  • 00:13 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:09 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED

2025-04-18

  • 23:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:30 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:56 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:54 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:53 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1179
  • 21:52 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1179
  • 21:51 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:48 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:11 brett@dns1005: END - running authdns-update
  • 21:09 brett@dns1005: START - running authdns-update
  • 15:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2045.codfw.wmnet with OS bookworm
  • 14:53 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs1015.eqiad.wmnet
  • 14:45 eevans@cumin1002: START - Cookbook sre.hosts.reboot-single for host aqs1015.eqiad.wmnet
  • 14:45 urandom: rebooting aqs1015.eqiad.wmnet (drive detection/ordering) — T391903
  • 14:40 _joe_: enabled slow query log on db1218, investigating T390510
  • 11:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 200132
  • 11:17 cmooney@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 200132
  • 10:08 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:57 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ms-fe1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:56 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:53 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ms-fe1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:52 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe1016
  • 09:52 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe1016
  • 09:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:49 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ms-fe1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:48 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:48 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ms-fe1016] - vriley@cumin1002"
  • 09:47 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ms-fe1016] - vriley@cumin1002"
  • 09:43 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 09:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-fe1015']
  • 09:39 vriley@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1015']
  • 09:38 vriley@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-fe1015']
  • 09:38 vriley@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1015']
  • 09:02 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ms-fe1015.eqiad.wmnet with OS bullseye
  • 09:00 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:45 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1182.eqiad.wmnet with OS bullseye
  • 08:45 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 08:40 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 08:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ms-fe1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:37 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe1015
  • 08:37 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe1015
  • 08:36 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:36 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ms-fe1015] - vriley@cumin1002"
  • 08:36 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ms-fe1015] - vriley@cumin1002"
  • 08:30 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 08:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 08:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 08:18 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1182.eqiad.wmnet with reason: host reimage
  • 08:14 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1182.eqiad.wmnet with reason: host reimage
  • 07:58 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1182.eqiad.wmnet with OS bullseye
  • 07:57 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1182.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:56 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1182.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:52 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:45 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:22 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1184.eqiad.wmnet with OS bullseye
  • 07:22 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 06:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2239.codfw.wmnet with reason: Maintenance
  • 06:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T391056)', diff saved to https://phabricator.wikimedia.org/P75283 and previous config saved to /var/cache/conftool/dbconfig/20250418-062830-fceratto.json
  • 06:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P75282 and previous config saved to /var/cache/conftool/dbconfig/20250418-061324-fceratto.json
  • 05:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P75281 and previous config saved to /var/cache/conftool/dbconfig/20250418-055816-fceratto.json
  • 05:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T391056)', diff saved to https://phabricator.wikimedia.org/P75280 and previous config saved to /var/cache/conftool/dbconfig/20250418-054309-fceratto.json
  • 05:37 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2237 (T391056)', diff saved to https://phabricator.wikimedia.org/P75279 and previous config saved to /var/cache/conftool/dbconfig/20250418-053713-fceratto.json
  • 05:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2237.codfw.wmnet with reason: Maintenance
  • 05:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T391056)', diff saved to https://phabricator.wikimedia.org/P75278 and previous config saved to /var/cache/conftool/dbconfig/20250418-053648-fceratto.json
  • 05:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P75277 and previous config saved to /var/cache/conftool/dbconfig/20250418-052141-fceratto.json
  • 05:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P75276 and previous config saved to /var/cache/conftool/dbconfig/20250418-050635-fceratto.json
  • 04:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T391056)', diff saved to https://phabricator.wikimedia.org/P75275 and previous config saved to /var/cache/conftool/dbconfig/20250418-045127-fceratto.json
  • 04:45 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2236 (T391056)', diff saved to https://phabricator.wikimedia.org/P75274 and previous config saved to /var/cache/conftool/dbconfig/20250418-044545-fceratto.json
  • 04:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2236.codfw.wmnet with reason: Maintenance
  • 04:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T391056)', diff saved to https://phabricator.wikimedia.org/P75273 and previous config saved to /var/cache/conftool/dbconfig/20250418-044523-fceratto.json
  • 04:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P75272 and previous config saved to /var/cache/conftool/dbconfig/20250418-043015-fceratto.json
  • 04:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P75271 and previous config saved to /var/cache/conftool/dbconfig/20250418-041508-fceratto.json
  • 04:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T391056)', diff saved to https://phabricator.wikimedia.org/P75270 and previous config saved to /var/cache/conftool/dbconfig/20250418-040001-fceratto.json
  • 03:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T391056)', diff saved to https://phabricator.wikimedia.org/P75269 and previous config saved to /var/cache/conftool/dbconfig/20250418-035406-fceratto.json
  • 03:53 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 03:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T391056)', diff saved to https://phabricator.wikimedia.org/P75268 and previous config saved to /var/cache/conftool/dbconfig/20250418-035342-fceratto.json
  • 03:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P75267 and previous config saved to /var/cache/conftool/dbconfig/20250418-033834-fceratto.json
  • 03:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P75266 and previous config saved to /var/cache/conftool/dbconfig/20250418-032327-fceratto.json
  • 03:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T391056)', diff saved to https://phabricator.wikimedia.org/P75265 and previous config saved to /var/cache/conftool/dbconfig/20250418-030820-fceratto.json
  • 03:02 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T391056)', diff saved to https://phabricator.wikimedia.org/P75264 and previous config saved to /var/cache/conftool/dbconfig/20250418-030239-fceratto.json
  • 03:02 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 03:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T391056)', diff saved to https://phabricator.wikimedia.org/P75263 and previous config saved to /var/cache/conftool/dbconfig/20250418-030216-fceratto.json
  • 02:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P75262 and previous config saved to /var/cache/conftool/dbconfig/20250418-024709-fceratto.json
  • 02:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P75261 and previous config saved to /var/cache/conftool/dbconfig/20250418-023202-fceratto.json
  • 02:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T391056)', diff saved to https://phabricator.wikimedia.org/P75260 and previous config saved to /var/cache/conftool/dbconfig/20250418-021655-fceratto.json
  • 02:11 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T391056)', diff saved to https://phabricator.wikimedia.org/P75259 and previous config saved to /var/cache/conftool/dbconfig/20250418-021122-fceratto.json
  • 02:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 02:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 02:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T391056)', diff saved to https://phabricator.wikimedia.org/P75258 and previous config saved to /var/cache/conftool/dbconfig/20250418-020728-fceratto.json
  • 01:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P75257 and previous config saved to /var/cache/conftool/dbconfig/20250418-015221-fceratto.json
  • 01:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P75256 and previous config saved to /var/cache/conftool/dbconfig/20250418-013714-fceratto.json
  • 01:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T391056)', diff saved to https://phabricator.wikimedia.org/P75255 and previous config saved to /var/cache/conftool/dbconfig/20250418-012207-fceratto.json
  • 01:16 wfan: civicrm upgraded from 38a7a649 to d7eefbc4
  • 01:16 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2179 (T391056)', diff saved to https://phabricator.wikimedia.org/P75254 and previous config saved to /var/cache/conftool/dbconfig/20250418-011558-fceratto.json
  • 01:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 01:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T391056)', diff saved to https://phabricator.wikimedia.org/P75253 and previous config saved to /var/cache/conftool/dbconfig/20250418-011536-fceratto.json
  • 01:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P75252 and previous config saved to /var/cache/conftool/dbconfig/20250418-010030-fceratto.json
  • 00:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P75251 and previous config saved to /var/cache/conftool/dbconfig/20250418-004524-fceratto.json
  • 00:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T391056)', diff saved to https://phabricator.wikimedia.org/P75250 and previous config saved to /var/cache/conftool/dbconfig/20250418-003016-fceratto.json
  • 00:24 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T391056)', diff saved to https://phabricator.wikimedia.org/P75249 and previous config saved to /var/cache/conftool/dbconfig/20250418-002408-fceratto.json
  • 00:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 00:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T391056)', diff saved to https://phabricator.wikimedia.org/P75248 and previous config saved to /var/cache/conftool/dbconfig/20250418-002344-fceratto.json
  • 00:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P75247 and previous config saved to /var/cache/conftool/dbconfig/20250418-000838-fceratto.json

2025-04-17

  • 23:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P75246 and previous config saved to /var/cache/conftool/dbconfig/20250417-235331-fceratto.json
  • 23:40 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 23:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T391056)', diff saved to https://phabricator.wikimedia.org/P75245 and previous config saved to /var/cache/conftool/dbconfig/20250417-233825-fceratto.json
  • 23:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T391056)', diff saved to https://phabricator.wikimedia.org/P75244 and previous config saved to /var/cache/conftool/dbconfig/20250417-233211-fceratto.json
  • 23:32 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 23:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 23:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T391056)', diff saved to https://phabricator.wikimedia.org/P75243 and previous config saved to /var/cache/conftool/dbconfig/20250417-233131-fceratto.json
  • 23:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P75242 and previous config saved to /var/cache/conftool/dbconfig/20250417-231625-fceratto.json
  • 23:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P75241 and previous config saved to /var/cache/conftool/dbconfig/20250417-230118-fceratto.json
  • 22:58 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
  • 22:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2097.codfw.wmnet with OS bullseye
  • 22:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T391056)', diff saved to https://phabricator.wikimedia.org/P75240 and previous config saved to /var/cache/conftool/dbconfig/20250417-224611-fceratto.json
  • 22:39 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T391056)', diff saved to https://phabricator.wikimedia.org/P75239 and previous config saved to /var/cache/conftool/dbconfig/20250417-223957-fceratto.json
  • 22:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 22:35 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1252.eqiad.wmnet with reason: Maintenance
  • 22:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T391056)', diff saved to https://phabricator.wikimedia.org/P75238 and previous config saved to /var/cache/conftool/dbconfig/20250417-223130-fceratto.json
  • 22:30 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 22:22 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2097.codfw.wmnet with reason: host reimage
  • 22:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
  • 22:20 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
  • 22:20 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 22:19 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 22:18 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2097.codfw.wmnet with reason: host reimage
  • 22:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P75237 and previous config saved to /var/cache/conftool/dbconfig/20250417-221623-fceratto.json
  • 22:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
  • 22:15 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
  • 22:15 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 22:14 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1184.eqiad.wmnet with reason: host reimage
  • 22:13 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cirrussearch2078.codfw.wmnet']
  • 22:10 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1184.eqiad.wmnet with reason: host reimage
  • 22:04 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2078.codfw.wmnet']
  • 22:04 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cirrussearch2078.codfw.wmnet']
  • 22:03 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2078.codfw.wmnet']
  • 22:02 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 22:01 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2097
  • 22:01 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2097
  • 22:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P75236 and previous config saved to /var/cache/conftool/dbconfig/20250417-220116-fceratto.json
  • 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2097
  • 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2097.codfw.wmnet 234.16.192.10.in-addr.arpa 4.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2097.codfw.wmnet 234.16.192.10.in-addr.arpa 4.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2097 - bking@cumin2002"
  • 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2097 - bking@cumin2002"
  • 21:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
  • 21:58 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
  • 21:58 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 21:57 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 21:54 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:52 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1184.eqiad.wmnet with OS bullseye
  • 21:50 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2097
  • 21:50 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2097.codfw.wmnet with OS bullseye
  • 21:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T391056)', diff saved to https://phabricator.wikimedia.org/P75235 and previous config saved to /var/cache/conftool/dbconfig/20250417-214610-fceratto.json
  • 21:46 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2078
  • 21:46 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
  • 21:46 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 21:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2097.codfw.wmnet on all recursors
  • 21:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2097.codfw.wmnet on all recursors
  • 21:42 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2097 to cirrussearch2097
  • 21:42 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2097
  • 21:41 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2097
  • 21:41 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:41 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2097 to cirrussearch2097 - bking@cumin2002"
  • 21:40 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2097 to cirrussearch2097 - bking@cumin2002"
  • 21:37 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1183.eqiad.wmnet with OS bullseye
  • 21:35 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:20 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:19 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2097 to cirrussearch2097
  • 21:19 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 21:19 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.move-vlan (exit_code=99) for host cirrussearch2078
  • 21:19 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2078
  • 21:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2078.codfw.wmnet with OS bullseye
  • 21:18 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2078.codfw.wmnet on all recursors
  • 21:18 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2078.codfw.wmnet on all recursors
  • 21:18 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2078 to cirrussearch2078
  • 21:17 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2078
  • 21:17 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2078
  • 21:17 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:17 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2078 to cirrussearch2078 - bking@cumin2002"
  • 21:17 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2078 to cirrussearch2078 - bking@cumin2002"
  • 21:16 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:15 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1183.eqiad.wmnet with reason: host reimage
  • 21:13 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:12 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2078 to cirrussearch2078
  • 21:11 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1183.eqiad.wmnet with reason: host reimage
  • 21:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2058.codfw.wmnet with OS bullseye
  • 20:56 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1183.eqiad.wmnet with OS bullseye
  • 20:55 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1183.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:50 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1178.eqiad.wmnet with OS bullseye
  • 20:45 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T391056)', diff saved to https://phabricator.wikimedia.org/P75234 and previous config saved to /var/cache/conftool/dbconfig/20250417-204552-fceratto.json
  • 20:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 20:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T391056)', diff saved to https://phabricator.wikimedia.org/P75233 and previous config saved to /var/cache/conftool/dbconfig/20250417-204528-fceratto.json
  • 20:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2058.codfw.wmnet with reason: host reimage
  • 20:37 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1183.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:34 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2058.codfw.wmnet with reason: host reimage
  • 20:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P75232 and previous config saved to /var/cache/conftool/dbconfig/20250417-203021-fceratto.json
  • 20:25 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1181.eqiad.wmnet with OS bullseye
  • 20:25 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1178.eqiad.wmnet with reason: host reimage
  • 20:25 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1178.eqiad.wmnet with reason: host reimage
  • 20:18 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2058
  • 20:18 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2058
  • 20:18 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2058
  • 20:18 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2058.codfw.wmnet 205.16.192.10.in-addr.arpa 5.0.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 20:18 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2058.codfw.wmnet 205.16.192.10.in-addr.arpa 5.0.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 20:18 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:18 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2058 - bking@cumin2002"
  • 20:18 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2058 - bking@cumin2002"
  • 20:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P75231 and previous config saved to /var/cache/conftool/dbconfig/20250417-201515-fceratto.json
  • 20:13 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 20:10 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1178.eqiad.wmnet with OS bullseye
  • 20:09 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1178.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:09 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2058
  • 20:08 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2058.codfw.wmnet with OS bullseye
  • 20:07 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2058.codfw.wmnet on all recursors
  • 20:07 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2058.codfw.wmnet on all recursors
  • 20:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2058 to cirrussearch2058
  • 20:06 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2058
  • 20:06 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2058
  • 20:06 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:06 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2058 to cirrussearch2058 - bking@cumin2002"
  • 20:05 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2058 to cirrussearch2058 - bking@cumin2002"
  • 20:02 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1181.eqiad.wmnet with reason: host reimage
  • 20:00 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 20:00 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2058 to cirrussearch2058
  • 20:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T391056)', diff saved to https://phabricator.wikimedia.org/P75230 and previous config saved to /var/cache/conftool/dbconfig/20250417-200008-fceratto.json
  • 19:59 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
  • 19:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
  • 19:58 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
  • 19:58 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1181.eqiad.wmnet with reason: host reimage
  • 19:55 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T391056)', diff saved to https://phabricator.wikimedia.org/P75229 and previous config saved to /var/cache/conftool/dbconfig/20250417-195506-fceratto.json
  • 19:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 19:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T391056)', diff saved to https://phabricator.wikimedia.org/P75228 and previous config saved to /var/cache/conftool/dbconfig/20250417-195442-fceratto.json
  • 19:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1178.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 19:50 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1178.eqiad.wmnet with OS bullseye
  • 19:44 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1181.eqiad.wmnet with OS bullseye
  • 19:43 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 19:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 19:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P75226 and previous config saved to /var/cache/conftool/dbconfig/20250417-193935-fceratto.json
  • 19:36 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1178.eqiad.wmnet with OS bullseye
  • 19:35 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1178.eqiad.wmnet with OS bullseye
  • 19:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P75225 and previous config saved to /var/cache/conftool/dbconfig/20250417-192430-fceratto.json
  • 19:22 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1178.eqiad.wmnet with OS bullseye
  • 19:21 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1178.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 19:14 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1178.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 19:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T391056)', diff saved to https://phabricator.wikimedia.org/P75223 and previous config saved to /var/cache/conftool/dbconfig/20250417-190923-fceratto.json
  • 19:03 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T391056)', diff saved to https://phabricator.wikimedia.org/P75222 and previous config saved to /var/cache/conftool/dbconfig/20250417-190331-fceratto.json
  • 19:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 18:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 18:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T391056)', diff saved to https://phabricator.wikimedia.org/P75221 and previous config saved to /var/cache/conftool/dbconfig/20250417-185930-fceratto.json
  • 18:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P75219 and previous config saved to /var/cache/conftool/dbconfig/20250417-184423-fceratto.json
  • 18:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P75218 and previous config saved to /var/cache/conftool/dbconfig/20250417-182916-fceratto.json
  • 18:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T391056)', diff saved to https://phabricator.wikimedia.org/P75217 and previous config saved to /var/cache/conftool/dbconfig/20250417-181408-fceratto.json
  • 18:13 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.25 refs T386220
  • 17:56 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T391056)', diff saved to https://phabricator.wikimedia.org/P75216 and previous config saved to /var/cache/conftool/dbconfig/20250417-175614-fceratto.json
  • 17:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 17:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T391056)', diff saved to https://phabricator.wikimedia.org/P75215 and previous config saved to /var/cache/conftool/dbconfig/20250417-175552-fceratto.json
  • 17:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P75214 and previous config saved to /var/cache/conftool/dbconfig/20250417-174046-fceratto.json
  • 17:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P75213 and previous config saved to /var/cache/conftool/dbconfig/20250417-172539-fceratto.json
  • 17:20 mutante: idp-test2005 - 100% disk space used - alerting since over 6 days (is there a point in alerts for test hosts?) - apt-get clean .. brought it back to 94%
  • 17:12 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T391056)', diff saved to https://phabricator.wikimedia.org/P75212 and previous config saved to /var/cache/conftool/dbconfig/20250417-171032-fceratto.json
  • 17:09 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:08 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:04 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T391056)', diff saved to https://phabricator.wikimedia.org/P75211 and previous config saved to /var/cache/conftool/dbconfig/20250417-170438-fceratto.json
  • 17:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 17:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T391056)', diff saved to https://phabricator.wikimedia.org/P75210 and previous config saved to /var/cache/conftool/dbconfig/20250417-170416-fceratto.json
  • 17:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2066.codfw.wmnet with OS bullseye
  • 16:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P75209 and previous config saved to /var/cache/conftool/dbconfig/20250417-164909-fceratto.json
  • 16:35 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2066.codfw.wmnet with reason: host reimage
  • 16:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P75208 and previous config saved to /var/cache/conftool/dbconfig/20250417-163403-fceratto.json
  • 16:32 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2066.codfw.wmnet with reason: host reimage
  • 16:30 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 16:30 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 16:30 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 16:30 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 16:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T391056)', diff saved to https://phabricator.wikimedia.org/P75207 and previous config saved to /var/cache/conftool/dbconfig/20250417-161854-fceratto.json
  • 16:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2066
  • 16:17 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2066
  • 16:15 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2066
  • 16:15 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2066.codfw.wmnet 69.32.192.10.in-addr.arpa 9.6.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 16:15 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2066.codfw.wmnet 69.32.192.10.in-addr.arpa 9.6.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 16:15 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:15 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2066 - bking@cumin2002"
  • 16:15 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2066 - bking@cumin2002"
  • 16:13 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T391056)', diff saved to https://phabricator.wikimedia.org/P75206 and previous config saved to /var/cache/conftool/dbconfig/20250417-161307-fceratto.json
  • 16:13 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 16:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T391056)', diff saved to https://phabricator.wikimedia.org/P75205 and previous config saved to /var/cache/conftool/dbconfig/20250417-161245-fceratto.json
  • 16:11 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:11 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2066
  • 16:10 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2066.codfw.wmnet with OS bullseye
  • 16:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2066 to cirrussearch2066
  • 16:09 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2066
  • 16:09 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2066
  • 16:09 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2066 to cirrussearch2066 - bking@cumin2002"
  • 16:07 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2066 to cirrussearch2066 - bking@cumin2002"
  • 15:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P75204 and previous config saved to /var/cache/conftool/dbconfig/20250417-155738-fceratto.json
  • 15:54 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 15:53 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2066 to cirrussearch2066
  • 15:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P75201 and previous config saved to /var/cache/conftool/dbconfig/20250417-154231-fceratto.json
  • 15:34 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwdebug2001.codfw.wmnet with OS bullseye
  • 15:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T391056)', diff saved to https://phabricator.wikimedia.org/P75200 and previous config saved to /var/cache/conftool/dbconfig/20250417-152724-fceratto.json
  • 15:13 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T391056)', diff saved to https://phabricator.wikimedia.org/P75199 and previous config saved to /var/cache/conftool/dbconfig/20250417-151330-fceratto.json
  • 15:13 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 15:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T391056)', diff saved to https://phabricator.wikimedia.org/P75198 and previous config saved to /var/cache/conftool/dbconfig/20250417-151308-fceratto.json
  • 14:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P75197 and previous config saved to /var/cache/conftool/dbconfig/20250417-145801-fceratto.json
  • 14:57 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwdebug2001.codfw.wmnet with reason: host reimage
  • 14:55 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage cirrussearch hosts - bking@cumin2002 - T388610
  • 14:53 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mwdebug2001.codfw.wmnet with reason: host reimage
  • 14:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P75196 and previous config saved to /var/cache/conftool/dbconfig/20250417-144254-fceratto.json
  • 14:36 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mwdebug2001.codfw.wmnet with OS bullseye
  • 14:32 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 14:31 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 14:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T391056)', diff saved to https://phabricator.wikimedia.org/P75195 and previous config saved to /var/cache/conftool/dbconfig/20250417-142746-fceratto.json
  • 14:22 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T391056)', diff saved to https://phabricator.wikimedia.org/P75194 and previous config saved to /var/cache/conftool/dbconfig/20250417-142221-fceratto.json
  • 14:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:21 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 14:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T391056)', diff saved to https://phabricator.wikimedia.org/P75193 and previous config saved to /var/cache/conftool/dbconfig/20250417-142139-fceratto.json
  • 14:11 hashar: Restarting Gerrit to apply replication configuration change
  • 14:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P75191 and previous config saved to /var/cache/conftool/dbconfig/20250417-140632-fceratto.json
  • 14:02 jiji@cumin1002: conftool action : set/pooled=inactive; selector: name=mwdebug2001.codfw.wmnet
  • 14:02 jiji@cumin1002: conftool action : set/pooled=inactive; selector: name=mwdebug1002.codfw.wmnet
  • 14:02 jiji@cumin1002: conftool action : set/pooled=yes; selector: name=mwdebug2002.codfw.wmnet
  • 14:01 jiji@cumin1002: conftool action : set/pooled=inactive; selector: name=mwdebug2002.codfw.wmnet
  • 13:57 dcausse: closing the UTC afternoon backport window
  • 13:54 dcausse@deploy1003: Finished scap sync-world: Backport for wikimaniawiki: add extendedconfirmed to translationadmin (T389729) (duration: 13m 25s)
  • 13:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P75190 and previous config saved to /var/cache/conftool/dbconfig/20250417-135125-fceratto.json
  • 13:49 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage cirrussearch hosts - bking@cumin2002 - T388610
  • 13:48 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage cirrussearch hosts - bking@cumin2002 - T388610
  • 13:47 dcausse@deploy1003: dcausse, robertsky: Continuing with sync
  • 13:46 dcausse@deploy1003: dcausse, robertsky: Backport for wikimaniawiki: add extendedconfirmed to translationadmin (T389729) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:41 dcausse@deploy1003: Started scap sync-world: Backport for wikimaniawiki: add extendedconfirmed to translationadmin (T389729)
  • 13:38 dcausse@deploy1003: Finished scap sync-world: Backport for Gracefully handle BadRevisionException (T382904), Gracefully handle BadRevisionException (T382904) (duration: 12m 23s)
  • 13:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T391056)', diff saved to https://phabricator.wikimedia.org/P75189 and previous config saved to /var/cache/conftool/dbconfig/20250417-133618-fceratto.json
  • 13:31 dcausse@deploy1003: dcausse: Continuing with sync
  • 13:30 dcausse@deploy1003: dcausse: Backport for Gracefully handle BadRevisionException (T382904), Gracefully handle BadRevisionException (T382904) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:30 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T391056)', diff saved to https://phabricator.wikimedia.org/P75188 and previous config saved to /var/cache/conftool/dbconfig/20250417-133004-fceratto.json
  • 13:29 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 13:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T391056)', diff saved to https://phabricator.wikimedia.org/P75187 and previous config saved to /var/cache/conftool/dbconfig/20250417-132942-fceratto.json
  • 13:28 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage cirrussearch hosts - bking@cumin2002 - T388610
  • 13:25 dcausse@deploy1003: Started scap sync-world: Backport for Gracefully handle BadRevisionException (T382904), Gracefully handle BadRevisionException (T382904)
  • 13:17 dreamyjazz@deploy1003: Finished scap sync-world: Backport for frwiki: Add abusefilter-access-protected-vars to EFM, remove it from sysops. (T381722) (duration: 13m 18s)
  • 13:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P75186 and previous config saved to /var/cache/conftool/dbconfig/20250417-131435-fceratto.json
  • 13:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1180.eqiad.wmnet with OS bullseye
  • 13:10 dreamyjazz@deploy1003: dreamyjazz, wpld: Continuing with sync
  • 13:09 dreamyjazz@deploy1003: dreamyjazz, wpld: Backport for frwiki: Add abusefilter-access-protected-vars to EFM, remove it from sysops. (T381722) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:03 dreamyjazz@deploy1003: Started scap sync-world: Backport for frwiki: Add abusefilter-access-protected-vars to EFM, remove it from sysops. (T381722)
  • 12:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P75185 and previous config saved to /var/cache/conftool/dbconfig/20250417-125928-fceratto.json
  • 12:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be1065.eqiad.wmnet
  • 12:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.remove-downtime for ms-be1065.eqiad.wmnet
  • 12:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1180.eqiad.wmnet with reason: host reimage
  • 12:46 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:46 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1180.eqiad.wmnet with reason: host reimage
  • 12:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T391056)', diff saved to https://phabricator.wikimedia.org/P75184 and previous config saved to /var/cache/conftool/dbconfig/20250417-124421-fceratto.json
  • 12:43 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:42 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:38 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T391056)', diff saved to https://phabricator.wikimedia.org/P75183 and previous config saved to /var/cache/conftool/dbconfig/20250417-123804-fceratto.json
  • 12:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 12:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T391056)', diff saved to https://phabricator.wikimedia.org/P75182 and previous config saved to /var/cache/conftool/dbconfig/20250417-123742-fceratto.json
  • 12:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1180.eqiad.wmnet with OS bullseye
  • 12:25 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1065.eqiad.wmnet with reason: vacuum overlarge container dbs
  • 12:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be1066.eqiad.wmnet
  • 12:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.remove-downtime for ms-be1066.eqiad.wmnet
  • 12:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P75181 and previous config saved to /var/cache/conftool/dbconfig/20250417-122235-fceratto.json
  • 12:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P75180 and previous config saved to /var/cache/conftool/dbconfig/20250417-120728-fceratto.json
  • 11:57 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be1066.eqiad.wmnet with reason: vacuum overlarge container dbs
  • 11:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T391056)', diff saved to https://phabricator.wikimedia.org/P75179 and previous config saved to /var/cache/conftool/dbconfig/20250417-115221-fceratto.json
  • 11:45 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1160 (T391056)', diff saved to https://phabricator.wikimedia.org/P75178 and previous config saved to /var/cache/conftool/dbconfig/20250417-114551-fceratto.json
  • 11:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 11:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 11:34 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be1066.eqiad.wmnet with reason: vacuum overlarge container dbs
  • 09:14 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@1e9e1f9]: bump image suggestions to 1.5.0 (duration: 01m 54s)
  • 09:13 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@1e9e1f9]: bump image suggestions to 1.5.0
  • 08:58 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1091.eqiad.wmnet with OS bullseye
  • 08:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1091.eqiad.wmnet with reason: host reimage
  • 08:50 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve-ctrl1002.eqiad.wmnet with OS bookworm
  • 08:49 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1091.eqiad.wmnet with reason: host reimage
  • 08:34 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1091.eqiad.wmnet with OS bullseye
  • 08:34 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be1091.eqiad.wmnet with OS bullseye
  • 08:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: host reimage
  • 08:27 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: host reimage
  • 08:20 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1091.eqiad.wmnet with OS bullseye
  • 08:12 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve-ctrl1002.eqiad.wmnet with OS bookworm
  • 07:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:41 brouberol@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 05:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.gerrit.failover (exit_code=0) from gerrit2002.wikimedia.org to gerrit2003.wikimedia.org
  • 05:28 arnaudb@cumin1002: START - Cookbook sre.gerrit.failover from gerrit2002.wikimedia.org to gerrit2003.wikimedia.org
  • 05:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.gerrit.failover (exit_code=0) from gerrit2002.wikimedia.org to gerrit2003.wikimedia.org
  • 05:27 arnaudb@cumin1002: START - Cookbook sre.gerrit.failover from gerrit2002.wikimedia.org to gerrit2003.wikimedia.org
  • 05:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.gerrit.failover (exit_code=0) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 05:03 arnaudb@cumin1002: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 03:54 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 03:54 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
  • 02:48 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
  • 02:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2091
  • 02:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2091
  • 02:34 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 02:22 TheresNoTime: [samtar@mwmaint1002 ~]$ mwscript maintenance/cleanupTitles.php --wiki=shwiktionary # `Razgovor:Vikirečnik:Srpskohrvatski`
  • 00:55 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 00:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T391056)', diff saved to https://phabricator.wikimedia.org/P75177 and previous config saved to /var/cache/conftool/dbconfig/20250417-002743-fceratto.json
  • 00:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P75176 and previous config saved to /var/cache/conftool/dbconfig/20250417-001235-fceratto.json

2025-04-16

  • 23:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P75175 and previous config saved to /var/cache/conftool/dbconfig/20250416-235728-fceratto.json
  • 23:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T391056)', diff saved to https://phabricator.wikimedia.org/P75174 and previous config saved to /var/cache/conftool/dbconfig/20250416-234221-fceratto.json
  • 23:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2091
  • 23:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2091
  • 23:34 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 23:33 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 23:33 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2091
  • 23:33 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2091
  • 23:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 23:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2238 (T391056)', diff saved to https://phabricator.wikimedia.org/P75173 and previous config saved to /var/cache/conftool/dbconfig/20250416-233200-fceratto.json
  • 23:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2238.codfw.wmnet with reason: Maintenance
  • 23:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T391056)', diff saved to https://phabricator.wikimedia.org/P75172 and previous config saved to /var/cache/conftool/dbconfig/20250416-233148-fceratto.json
  • 23:28 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 23:16 urandom: decommissioning restbase1028/Cassandra — T389423
  • 23:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P75171 and previous config saved to /var/cache/conftool/dbconfig/20250416-231641-fceratto.json
  • 23:16 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2091
  • 23:15 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2091
  • 23:15 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 23:15 eevans@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1028.eqiad.wmnet with reason: Decommissioning — T389423
  • 23:14 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 23:11 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1045.eqiad.wmnet
  • 23:11 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase1045.eqiad.wmnet
  • 23:11 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1044.eqiad.wmnet
  • 23:11 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase1044.eqiad.wmnet
  • 23:10 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1043.eqiad.wmnet
  • 23:10 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase1043.eqiad.wmnet
  • 23:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P75170 and previous config saved to /var/cache/conftool/dbconfig/20250416-230134-fceratto.json
  • 22:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2091
  • 22:54 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2091
  • 22:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 22:49 aude@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
  • 22:49 aude@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
  • 22:46 aude@deploy1003: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
  • 22:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T391056)', diff saved to https://phabricator.wikimedia.org/P75169 and previous config saved to /var/cache/conftool/dbconfig/20250416-224627-fceratto.json
  • 22:46 aude@deploy1003: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
  • 22:44 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2226 (T391056)', diff saved to https://phabricator.wikimedia.org/P75168 and previous config saved to /var/cache/conftool/dbconfig/20250416-224405-fceratto.json
  • 22:43 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 22:43 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2226.codfw.wmnet with reason: Maintenance
  • 22:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T391056)', diff saved to https://phabricator.wikimedia.org/P75167 and previous config saved to /var/cache/conftool/dbconfig/20250416-224325-fceratto.json
  • 22:36 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
  • 22:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P75166 and previous config saved to /var/cache/conftool/dbconfig/20250416-222818-fceratto.json
  • 22:26 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2079.codfw.wmnet with OS bullseye
  • 22:21 aude@deploy1003: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
  • 22:20 aude@deploy1003: helmfile [staging] START helmfile.d/services/chart-renderer: apply
  • 22:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P75165 and previous config saved to /var/cache/conftool/dbconfig/20250416-221311-fceratto.json
  • 22:01 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2079.codfw.wmnet with reason: host reimage
  • 21:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T391056)', diff saved to https://phabricator.wikimedia.org/P75164 and previous config saved to /var/cache/conftool/dbconfig/20250416-215804-fceratto.json
  • 21:56 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2079.codfw.wmnet with reason: host reimage
  • 21:55 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch2.*
  • 21:47 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2225 (T391056)', diff saved to https://phabricator.wikimedia.org/P75163 and previous config saved to /var/cache/conftool/dbconfig/20250416-214710-fceratto.json
  • 21:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2225.codfw.wmnet with reason: Maintenance
  • 21:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T391056)', diff saved to https://phabricator.wikimedia.org/P75162 and previous config saved to /var/cache/conftool/dbconfig/20250416-214648-fceratto.json
  • 21:41 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2079
  • 21:41 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2079
  • 21:41 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2079
  • 21:41 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2079.codfw.wmnet 128.16.192.10.in-addr.arpa 8.2.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 21:41 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2079.codfw.wmnet 128.16.192.10.in-addr.arpa 8.2.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 21:41 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:41 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2079 - bking@cumin2002"
  • 21:41 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2079 - bking@cumin2002"
  • 21:33 reedy@deploy1003: Finished scap sync-world: Backport for specials: Fix PHP Warning on Special:PasswordReset for crafted input (T392086) (duration: 11m 47s)
  • 21:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P75161 and previous config saved to /var/cache/conftool/dbconfig/20250416-213141-fceratto.json
  • 21:30 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:28 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2079
  • 21:27 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2079.codfw.wmnet with OS bullseye
  • 21:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2079.codfw.wmnet on all recursors
  • 21:27 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2079.codfw.wmnet on all recursors
  • 21:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2079 to cirrussearch2079
  • 21:26 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2079
  • 21:26 reedy@deploy1003: reedy: Continuing with sync
  • 21:26 reedy@deploy1003: reedy: Backport for specials: Fix PHP Warning on Special:PasswordReset for crafted input (T392086) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:25 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2079
  • 21:25 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:25 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2079 to cirrussearch2079 - bking@cumin2002"
  • 21:21 reedy@deploy1003: Started scap sync-world: Backport for specials: Fix PHP Warning on Special:PasswordReset for crafted input (T392086)
  • 21:18 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2079 to cirrussearch2079 - bking@cumin2002"
  • 21:17 ecarg@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 21:16 ecarg@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 21:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P75160 and previous config saved to /var/cache/conftool/dbconfig/20250416-211634-fceratto.json
  • 21:16 ecarg@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:15 ecarg@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:14 ecarg@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:13 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:13 ecarg@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:13 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2079 to cirrussearch2079
  • 21:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2077.codfw.wmnet with OS bullseye
  • 21:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T391056)', diff saved to https://phabricator.wikimedia.org/P75159 and previous config saved to /var/cache/conftool/dbconfig/20250416-210128-fceratto.json
  • 20:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T391056)', diff saved to https://phabricator.wikimedia.org/P75158 and previous config saved to /var/cache/conftool/dbconfig/20250416-205907-fceratto.json
  • 20:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 20:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2077.codfw.wmnet with reason: host reimage
  • 20:50 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 20:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T391056)', diff saved to https://phabricator.wikimedia.org/P75157 and previous config saved to /var/cache/conftool/dbconfig/20250416-204957-fceratto.json
  • 20:48 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2077.codfw.wmnet with reason: host reimage
  • 20:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P75156 and previous config saved to /var/cache/conftool/dbconfig/20250416-203450-fceratto.json
  • 20:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2077
  • 20:34 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2077
  • 20:33 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2077
  • 20:33 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2077.codfw.wmnet 125.16.192.10.in-addr.arpa 5.2.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 20:33 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2077.codfw.wmnet 125.16.192.10.in-addr.arpa 5.2.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 20:33 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:33 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2077 - bking@cumin2002"
  • 20:33 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2077 - bking@cumin2002"
  • 20:28 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 20:28 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2077
  • 20:28 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2077.codfw.wmnet with OS bullseye
  • 20:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2077.codfw.wmnet on all recursors
  • 20:27 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2077.codfw.wmnet on all recursors
  • 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2077 to cirrussearch2077
  • 20:26 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2077
  • 20:26 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2077
  • 20:26 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:26 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2077 to cirrussearch2077 - bking@cumin2002"
  • 20:25 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2077 to cirrussearch2077 - bking@cumin2002"
  • 20:20 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 20:20 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2077 to cirrussearch2077
  • 20:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P75155 and previous config saved to /var/cache/conftool/dbconfig/20250416-201943-fceratto.json
  • 20:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2063.codfw.wmnet with OS bullseye
  • 20:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T391056)', diff saved to https://phabricator.wikimedia.org/P75154 and previous config saved to /var/cache/conftool/dbconfig/20250416-200437-fceratto.json
  • 19:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T391056)', diff saved to https://phabricator.wikimedia.org/P75153 and previous config saved to /var/cache/conftool/dbconfig/20250416-195408-fceratto.json
  • 19:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 19:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T391056)', diff saved to https://phabricator.wikimedia.org/P75152 and previous config saved to /var/cache/conftool/dbconfig/20250416-195345-fceratto.json
  • 19:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2063.codfw.wmnet with reason: host reimage
  • 19:45 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2063.codfw.wmnet with reason: host reimage
  • 19:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P75151 and previous config saved to /var/cache/conftool/dbconfig/20250416-193838-fceratto.json
  • 19:34 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 19:33 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 19:30 swfrench@deploy1003: Stopping before sync operations
  • 19:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2063
  • 19:30 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2063
  • 19:30 swfrench@deploy1003: Started scap sync-world: Test stop-before-sync scap run to pick up make-container-image changes - T390251
  • 19:30 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2063
  • 19:30 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2063.codfw.wmnet 108.16.192.10.in-addr.arpa 8.0.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 19:30 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2063.codfw.wmnet 108.16.192.10.in-addr.arpa 8.0.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 19:30 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:30 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2063 - bking@cumin2002"
  • 19:30 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2063 - bking@cumin2002"
  • 19:25 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 19:25 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2063
  • 19:25 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2063.codfw.wmnet with OS bullseye
  • 19:24 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2063.codfw.wmnet on all recursors
  • 19:24 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2063.codfw.wmnet on all recursors
  • 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2063 to cirrussearch2063
  • 19:23 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2063
  • 19:23 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2063
  • 19:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P75150 and previous config saved to /var/cache/conftool/dbconfig/20250416-192330-fceratto.json
  • 19:23 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:23 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2063 to cirrussearch2063 - bking@cumin2002"
  • 19:23 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2063 to cirrussearch2063 - bking@cumin2002"
  • 19:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T391056)', diff saved to https://phabricator.wikimedia.org/P75149 and previous config saved to /var/cache/conftool/dbconfig/20250416-190823-fceratto.json
  • 19:06 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 19:06 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2063 to cirrussearch2063
  • 18:56 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T391056)', diff saved to https://phabricator.wikimedia.org/P75148 and previous config saved to /var/cache/conftool/dbconfig/20250416-185651-fceratto.json
  • 18:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 18:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T391056)', diff saved to https://phabricator.wikimedia.org/P75147 and previous config saved to /var/cache/conftool/dbconfig/20250416-185628-fceratto.json
  • 18:44 sukhe: re-enable puppet on A:durum
  • 18:42 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.25 refs T386220
  • 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum3003.esams.wmnet with OS bookworm
  • 18:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P75146 and previous config saved to /var/cache/conftool/dbconfig/20250416-184121-fceratto.json
  • 18:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P75145 and previous config saved to /var/cache/conftool/dbconfig/20250416-182613-fceratto.json
  • 18:22 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3003.esams.wmnet with reason: host reimage
  • 18:19 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3003.esams.wmnet with reason: host reimage
  • 18:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T391056)', diff saved to https://phabricator.wikimedia.org/P75144 and previous config saved to /var/cache/conftool/dbconfig/20250416-181105-fceratto.json
  • 18:08 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
  • 18:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2070.codfw.wmnet with OS bullseye
  • 17:58 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T391056)', diff saved to https://phabricator.wikimedia.org/P75142 and previous config saved to /var/cache/conftool/dbconfig/20250416-175842-fceratto.json
  • 17:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 17:55 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host durum3003.esams.wmnet with OS bookworm
  • 17:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 17:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T391056)', diff saved to https://phabricator.wikimedia.org/P75140 and previous config saved to /var/cache/conftool/dbconfig/20250416-174828-fceratto.json
  • 17:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2070.codfw.wmnet with reason: host reimage
  • 17:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2070.codfw.wmnet with reason: host reimage
  • 17:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P75139 and previous config saved to /var/cache/conftool/dbconfig/20250416-173320-fceratto.json
  • 17:33 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 17:33 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 17:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P75138 and previous config saved to /var/cache/conftool/dbconfig/20250416-171813-fceratto.json
  • 17:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T391056)', diff saved to https://phabricator.wikimedia.org/P75137 and previous config saved to /var/cache/conftool/dbconfig/20250416-170305-fceratto.json
  • 17:00 cgoubert@deploy1003: Finished scap sync-world: Deploy mediawiki chart 0.8.11 (duration: 03m 02s)
  • 16:58 cgoubert@deploy1003: Started scap sync-world: Deploy mediawiki chart 0.8.11
  • 16:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1254 (T391056)', diff saved to https://phabricator.wikimedia.org/P75136 and previous config saved to /var/cache/conftool/dbconfig/20250416-165118-fceratto.json
  • 16:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1254.eqiad.wmnet with reason: Maintenance
  • 16:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 16:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T391056)', diff saved to https://phabricator.wikimedia.org/P75135 and previous config saved to /var/cache/conftool/dbconfig/20250416-164216-fceratto.json
  • 16:37 kevinbazira@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 16:36 kevinbazira@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2070
  • 16:36 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2070
  • 16:36 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2070
  • 16:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2070.codfw.wmnet 110.16.192.10.in-addr.arpa 0.1.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 16:36 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2070.codfw.wmnet 110.16.192.10.in-addr.arpa 0.1.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 16:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:33 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:33 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2070
  • 16:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2070.codfw.wmnet with OS bullseye
  • 16:32 kevinbazira@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 16:32 kevinbazira@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 16:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P75133 and previous config saved to /var/cache/conftool/dbconfig/20250416-162709-fceratto.json
  • 16:23 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
  • 16:22 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2110.codfw.wmnet on all recursors
  • 16:22 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2110.codfw.wmnet on all recursors
  • 16:22 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2095.codfw.wmnet on all recursors
  • 16:22 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2095.codfw.wmnet on all recursors
  • 16:21 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2070.codfw.wmnet with OS bullseye
  • 16:21 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.move-vlan (exit_code=93) for host cirrussearch2070
  • 16:21 bking@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:20 cgoubert@deploy1003: Finished scap sync-world: Deploy mediawiki chart 0.8.10 (duration: 03m 20s)
  • 16:18 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:18 cgoubert@deploy1003: Started scap sync-world: Deploy mediawiki chart 0.8.10
  • 16:18 bking@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cirrussearch2070
  • 16:18 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2070
  • 16:18 bking@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cirrussearch2070
  • 16:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P75132 and previous config saved to /var/cache/conftool/dbconfig/20250416-161202-fceratto.json
  • 16:11 sukhe@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3003.esams.wmnet with reason: testing ECH
  • 16:10 sukhe: stopping bird on durum3003 to temporarily disable advertising of anycast IPs
  • 16:08 sukhe: sudo cumin 'A:durum' 'disable-puppet "rolling out CR 1136772"'
  • 16:07 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2070
  • 16:07 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2070.codfw.wmnet 110.16.192.10.in-addr.arpa 0.1.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 16:07 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2070.codfw.wmnet 110.16.192.10.in-addr.arpa 0.1.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 16:07 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:07 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2070 - bking@cumin2002"
  • 16:07 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2070 - bking@cumin2002"
  • 16:07 kevinbazira@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 16:07 kevinbazira@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 16:01 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2070
  • 16:00 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2070.codfw.wmnet with OS bullseye
  • 15:59 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2070.codfw.wmnet on all recursors
  • 15:59 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2070.codfw.wmnet on all recursors
  • 15:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2070 to cirrussearch2070
  • 15:58 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2070
  • 15:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2070
  • 15:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2070 to cirrussearch2070 - bking@cumin2002"
  • 15:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T391056)', diff saved to https://phabricator.wikimedia.org/P75129 and previous config saved to /var/cache/conftool/dbconfig/20250416-155655-fceratto.json
  • 15:51 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2070 to cirrussearch2070 - bking@cumin2002"
  • 15:46 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 15:46 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2070 to cirrussearch2070
  • 15:45 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T391056)', diff saved to https://phabricator.wikimedia.org/P75128 and previous config saved to /var/cache/conftool/dbconfig/20250416-154515-fceratto.json
  • 15:45 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
  • 15:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 15:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T391056)', diff saved to https://phabricator.wikimedia.org/P75127 and previous config saved to /var/cache/conftool/dbconfig/20250416-154452-fceratto.json
  • 15:32 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
  • 15:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P75126 and previous config saved to /var/cache/conftool/dbconfig/20250416-152945-fceratto.json
  • 15:17 sukhe@dns1004: END - running authdns-update
  • 15:14 sukhe@dns1004: START - running authdns-update
  • 15:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P75125 and previous config saved to /var/cache/conftool/dbconfig/20250416-151438-fceratto.json
  • 14:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T391056)', diff saved to https://phabricator.wikimedia.org/P75124 and previous config saved to /var/cache/conftool/dbconfig/20250416-145928-fceratto.json
  • 14:57 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T391056)', diff saved to https://phabricator.wikimedia.org/P75123 and previous config saved to /var/cache/conftool/dbconfig/20250416-145718-fceratto.json
  • 14:57 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 14:53 kamila@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 14:52 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 14:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 14:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T391056)', diff saved to https://phabricator.wikimedia.org/P75122 and previous config saved to /var/cache/conftool/dbconfig/20250416-144750-fceratto.json
  • 14:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:44 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P75121 and previous config saved to /var/cache/conftool/dbconfig/20250416-143242-fceratto.json
  • 14:29 sukhe@dns1004: END - running authdns-update
  • 14:27 sukhe@dns1004: START - running authdns-update
  • 14:26 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
  • 14:26 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
  • 14:22 sukhe: reprepro -C component/nginx-ech include bookworm-wikimedia nginx_1.22.1-9+deb12u1+ech3_amd64.changes: T205378
  • 14:18 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - bking@cumin2002 - T388610
  • 14:17 brouberol@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2101.codfw.wmnet on all recursors
  • 14:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P75120 and previous config saved to /var/cache/conftool/dbconfig/20250416-141735-fceratto.json
  • 14:17 brouberol@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2101.codfw.wmnet on all recursors
  • 14:17 brouberol@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2099.codfw.wmnet on all recursors
  • 14:17 brouberol@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2099.codfw.wmnet on all recursors
  • 14:17 brouberol@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2071.codfw.wmnet on all recursors
  • 14:17 brouberol@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2071.codfw.wmnet on all recursors
  • 14:16 brouberol@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row B - brouberol@cumin2002 - T388610
  • 14:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T391056)', diff saved to https://phabricator.wikimedia.org/P75119 and previous config saved to /var/cache/conftool/dbconfig/20250416-140228-fceratto.json
  • 13:55 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:55 eevans@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1045.eqiad.wmnet with reason: Bootstrapping — T389423
  • 13:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 13:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 13:53 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Release campaignEvents extension to azwiki (T390805) (duration: 19m 09s)
  • 13:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 13:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 13:51 jelto: "Imported helm311 3.11.3-4 to bullseye-wikimedia and bookworm-wikimedia - T387548"
  • 13:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T391056)', diff saved to https://phabricator.wikimedia.org/P75118 and previous config saved to /var/cache/conftool/dbconfig/20250416-135121-fceratto.json
  • 13:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 13:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T391056)', diff saved to https://phabricator.wikimedia.org/P75117 and previous config saved to /var/cache/conftool/dbconfig/20250416-135059-fceratto.json
  • 13:47 lucaswerkmeister-wmde@deploy1003: mhorsey, lucaswerkmeister-wmde: Continuing with sync
  • 13:44 lucaswerkmeister-wmde@deploy1003: mhorsey, lucaswerkmeister-wmde: Backport for Release campaignEvents extension to azwiki (T390805) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P75116 and previous config saved to /var/cache/conftool/dbconfig/20250416-133552-fceratto.json
  • 13:34 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Release campaignEvents extension to azwiki (T390805)
  • 13:28 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for search-redirect: fix case-sensitivity of project name (T391297) (duration: 22m 55s)
  • 13:24 godog: finish rollout of thanos 0.38 to prometheus* - T383966
  • 13:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P75115 and previous config saved to /var/cache/conftool/dbconfig/20250416-132043-fceratto.json
  • 13:20 lucaswerkmeister-wmde@deploy1003: wargo, lucaswerkmeister-wmde: Continuing with sync
  • 13:18 godog: bounce thanos on titan100* - overload
  • 13:17 lucaswerkmeister-wmde@deploy1003: wargo, lucaswerkmeister-wmde: Backport for search-redirect: fix case-sensitivity of project name (T391297) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:06 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for search-redirect: fix case-sensitivity of project name (T391297)
  • 13:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T391056)', diff saved to https://phabricator.wikimedia.org/P75114 and previous config saved to /var/cache/conftool/dbconfig/20250416-130536-fceratto.json
  • 13:03 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T391056)', diff saved to https://phabricator.wikimedia.org/P75113 and previous config saved to /var/cache/conftool/dbconfig/20250416-130326-fceratto.json
  • 13:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 13:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T391056)', diff saved to https://phabricator.wikimedia.org/P75112 and previous config saved to /var/cache/conftool/dbconfig/20250416-130303-fceratto.json
  • 12:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P75111 and previous config saved to /var/cache/conftool/dbconfig/20250416-124755-fceratto.json
  • 12:37 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aphlict2001.codfw.wmnet with OS bookworm
  • 12:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P75109 and previous config saved to /var/cache/conftool/dbconfig/20250416-123248-fceratto.json
  • 12:23 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 12:23 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 12:17 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T391056)', diff saved to https://phabricator.wikimedia.org/P75108 and previous config saved to /var/cache/conftool/dbconfig/20250416-121742-fceratto.json
  • 12:17 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:15 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T391056)', diff saved to https://phabricator.wikimedia.org/P75107 and previous config saved to /var/cache/conftool/dbconfig/20250416-121532-fceratto.json
  • 12:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 12:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T391056)', diff saved to https://phabricator.wikimedia.org/P75106 and previous config saved to /var/cache/conftool/dbconfig/20250416-121509-fceratto.json
  • 12:14 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:14 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:13 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:13 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:11 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aphlict2001.codfw.wmnet with reason: host reimage
  • 12:08 aokoth@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aphlict2001.codfw.wmnet with reason: host reimage
  • 12:00 cgoubert@deploy1003: Finished scap sync-world: Move mwscript wrapper from base image to copy on build - T391665 (duration: 50m 43s)
  • 12:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P75104 and previous config saved to /var/cache/conftool/dbconfig/20250416-120002-fceratto.json
  • 11:57 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:57 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:52 aokoth@cumin1002: START - Cookbook sre.hosts.reimage for host aphlict2001.codfw.wmnet with OS bookworm
  • 11:52 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 11:51 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:51 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on aphlict2001.codfw.wmnet with reason: Bookworm Re-image
  • 11:51 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:51 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 11:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P75103 and previous config saved to /var/cache/conftool/dbconfig/20250416-114455-fceratto.json
  • 11:41 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:41 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:37 jelto: temporarily disable query sites on miscweb vms - T350793
  • 11:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T391056)', diff saved to https://phabricator.wikimedia.org/P75102 and previous config saved to /var/cache/conftool/dbconfig/20250416-112948-fceratto.json
  • 11:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T391056)', diff saved to https://phabricator.wikimedia.org/P75101 and previous config saved to /var/cache/conftool/dbconfig/20250416-111822-fceratto.json
  • 11:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T391056)', diff saved to https://phabricator.wikimedia.org/P75100 and previous config saved to /var/cache/conftool/dbconfig/20250416-111759-fceratto.json
  • 11:11 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:10 cgoubert@deploy1003: Started scap sync-world: Move mwscript wrapper from base image to copy on build - T391665
  • 11:09 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 11:06 claime: Rebuilding php base images to pick up 1135922 - T391665
  • 11:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P75099 and previous config saved to /var/cache/conftool/dbconfig/20250416-110252-fceratto.json
  • 10:58 cgoubert@deploy1003: Finished scap build-images: (no justification provided) (duration: 05m 36s)
  • 10:52 cgoubert@deploy1003: Started scap build-images: (no justification provided)
  • 10:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P75098 and previous config saved to /var/cache/conftool/dbconfig/20250416-104744-fceratto.json
  • 10:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T391056)', diff saved to https://phabricator.wikimedia.org/P75097 and previous config saved to /var/cache/conftool/dbconfig/20250416-103236-fceratto.json
  • 10:29 MichaelG_WMF: migr@mwmaint1002:/srv/mediawiki/php-1.44.0-wmf.24$ time mwscript ./extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki ruwiki --verbose #T391695
  • 10:23 MichaelG_WMF: migr@mwmaint1002:/srv/mediawiki/php-1.44.0-wmf.24$ time mwscript ./extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki frwiki --verbose #T391695
  • 10:21 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T391056)', diff saved to https://phabricator.wikimedia.org/P75096 and previous config saved to /var/cache/conftool/dbconfig/20250416-102110-fceratto.json
  • 10:21 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:20 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 10:19 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database nupwiki (T390714)
  • 10:19 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database nupwiki (T390714)
  • 10:17 MichaelG_WMF: migr@mwmaint1002:/srv/mediawiki/php-1.44.0-wmf.25$ mwscript ./extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki testwiki --verbose #T391695
  • 10:13 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:11 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 09:54 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet with OS bookworm
  • 09:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for Change default thumbnail size to 250px (T355914) (duration: 19m 35s)
  • 09:36 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 09:36 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: host reimage
  • 09:35 ladsgroup@deploy1003: ladsgroup: Backport for Change default thumbnail size to 250px (T355914) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:32 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: host reimage
  • 09:23 ladsgroup@deploy1003: Started scap sync-world: Backport for Change default thumbnail size to 250px (T355914)
  • 09:22 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 100% (T360589) (duration: 19m 05s)
  • 09:18 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve-ctrl1001.eqiad.wmnet with OS bookworm
  • 09:15 vgutierrez: repooling cp4047 - T387238
  • 09:15 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 09:15 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 100% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:02 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 100% (T360589)
  • 09:02 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command '['helmfile', '-e', 'eqiad', '--selector', 'name=main', 'write-values', '--output-file-template', '/tmp/tmpsh_tee3p']' returned non-zero exit status 3. (scap version: 4.153.0) (duration: 15m 58s)
  • 08:59 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 08:58 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 100% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:46 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 100% (T360589)
  • 08:16 akosiaris: destroy the "main" helmfile releases for mw-wikifunctions. The service is now being powered by the single version MediaWiki HTTP routing solution releases, this is a cleanup.
  • 07:50 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 07:26 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:26 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:02 elukey: powercycle ml-serve2007 - OEM event registered in getsel (seems DIMM-related)
  • 06:09 volans: installing spicerack v10.1.0 on cumin1002
  • 05:38 volans: installing spicerack v10.1.0 on cumin2002
  • 02:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T391056)', diff saved to https://phabricator.wikimedia.org/P75094 and previous config saved to /var/cache/conftool/dbconfig/20250416-023052-fceratto.json
  • 02:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P75093 and previous config saved to /var/cache/conftool/dbconfig/20250416-021544-fceratto.json
  • 02:05 ejegg: payments-wiki upgraded from ba6e8d65 to 4ad609b4
  • 02:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P75092 and previous config saved to /var/cache/conftool/dbconfig/20250416-020036-fceratto.json
  • 01:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T391056)', diff saved to https://phabricator.wikimedia.org/P75091 and previous config saved to /var/cache/conftool/dbconfig/20250416-014529-fceratto.json
  • 01:29 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2222 (T391056)', diff saved to https://phabricator.wikimedia.org/P75090 and previous config saved to /var/cache/conftool/dbconfig/20250416-012924-fceratto.json
  • 01:29 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: Maintenance
  • 01:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T391056)', diff saved to https://phabricator.wikimedia.org/P75089 and previous config saved to /var/cache/conftool/dbconfig/20250416-012901-fceratto.json
  • 01:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P75088 and previous config saved to /var/cache/conftool/dbconfig/20250416-011353-fceratto.json
  • 00:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P75087 and previous config saved to /var/cache/conftool/dbconfig/20250416-005846-fceratto.json
  • 00:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T391056)', diff saved to https://phabricator.wikimedia.org/P75086 and previous config saved to /var/cache/conftool/dbconfig/20250416-004338-fceratto.json
  • 00:27 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2221 (T391056)', diff saved to https://phabricator.wikimedia.org/P75085 and previous config saved to /var/cache/conftool/dbconfig/20250416-002725-fceratto.json
  • 00:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2221.codfw.wmnet with reason: Maintenance
  • 00:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T391056)', diff saved to https://phabricator.wikimedia.org/P75084 and previous config saved to /var/cache/conftool/dbconfig/20250416-002703-fceratto.json
  • 00:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P75083 and previous config saved to /var/cache/conftool/dbconfig/20250416-001156-fceratto.json
  • 00:02 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610

2025-04-15

  • 23:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P75082 and previous config saved to /var/cache/conftool/dbconfig/20250415-235649-fceratto.json
  • 23:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2112.codfw.wmnet with OS bullseye
  • 23:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T391056)', diff saved to https://phabricator.wikimedia.org/P75081 and previous config saved to /var/cache/conftool/dbconfig/20250415-234142-fceratto.json
  • 23:32 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2112.codfw.wmnet with reason: host reimage
  • 23:27 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2112.codfw.wmnet with reason: host reimage
  • 23:25 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2220 (T391056)', diff saved to https://phabricator.wikimedia.org/P75080 and previous config saved to /var/cache/conftool/dbconfig/20250415-232535-fceratto.json
  • 23:25 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 23:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T391056)', diff saved to https://phabricator.wikimedia.org/P75079 and previous config saved to /var/cache/conftool/dbconfig/20250415-232511-fceratto.json
  • 23:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2112
  • 23:11 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2112
  • 23:11 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2112.codfw.wmnet with OS bullseye
  • 23:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P75078 and previous config saved to /var/cache/conftool/dbconfig/20250415-231003-fceratto.json
  • 23:10 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2112.codfw.wmnet on all recursors
  • 23:10 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2112.codfw.wmnet on all recursors
  • 23:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2112 to cirrussearch2112
  • 23:09 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2112
  • 23:09 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2112
  • 23:09 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2112 to cirrussearch2112 - bking@cumin2002"
  • 22:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P75077 and previous config saved to /var/cache/conftool/dbconfig/20250415-225456-fceratto.json
  • 22:52 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2112 to cirrussearch2112 - bking@cumin2002"
  • 22:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T391056)', diff saved to https://phabricator.wikimedia.org/P75076 and previous config saved to /var/cache/conftool/dbconfig/20250415-223949-fceratto.json
  • 22:35 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 22:35 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2112 to cirrussearch2112
  • 22:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T391056)', diff saved to https://phabricator.wikimedia.org/P75075 and previous config saved to /var/cache/conftool/dbconfig/20250415-222316-fceratto.json
  • 22:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 22:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2103.codfw.wmnet with OS bullseye
  • 22:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 21:57 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 21:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T391056)', diff saved to https://phabricator.wikimedia.org/P75074 and previous config saved to /var/cache/conftool/dbconfig/20250415-215714-fceratto.json
  • 21:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1180.eqiad.wmnet with OS bullseye
  • 21:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2103.codfw.wmnet with reason: host reimage
  • 21:46 urandom: bootstrapping Cassandra/restbase1045-{a,b,c} — T389423
  • 21:44 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2103.codfw.wmnet with reason: host reimage
  • 21:42 eevans@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1045.eqiad.wmnet with reason: Bootstrapping — T389423
  • 21:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P75073 and previous config saved to /var/cache/conftool/dbconfig/20250415-214206-fceratto.json
  • {{safesubst:SAL entry|1=21:41 jforrester@deploy1003: Finished scap sync-world: Backport for FetchHandler: Disable on non-repo wikis (T392014), FetchHandler: Don't read from the DB in getParamSettings on non-repo wikis either (T392014), FetchHandler: Disable on non-repo wikis (T392014), [[gerrit:1136808|FetchHandler: Don't read from the DB in getParamSettings on non-repo wikis either (T392014}}
  • 21:27 jforrester@deploy1003: jforrester: Continuing with sync
  • 21:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2103
  • 21:27 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2103
  • 21:27 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2103
  • 21:27 jforrester@deploy1003: jforrester: Backport for FetchHandler: Disable on non-repo wikis (T392014), FetchHandler: Don't read from the DB in getParamSettings on non-repo wikis either (T392014), FetchHandler: Disable on non-repo wikis (T392014), FetchHandler: Don't read from the DB in getParamSettings on non-repo wikis either (T392014) synced to
  • 21:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2103.codfw.wmnet 222.32.192.10.in-addr.arpa 2.2.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 21:27 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2103.codfw.wmnet 222.32.192.10.in-addr.arpa 2.2.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 21:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:27 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2103 - bking@cumin2002"
  • 21:27 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2103 - bking@cumin2002"
  • 21:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P75072 and previous config saved to /var/cache/conftool/dbconfig/20250415-212659-fceratto.json
  • 21:23 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:22 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2103
  • 21:22 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2103.codfw.wmnet with OS bullseye
  • 21:22 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2103.codfw.wmnet on all recursors
  • 21:22 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2103.codfw.wmnet on all recursors
  • 21:22 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2103 to cirrussearch2103
  • 21:21 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2103
  • 21:21 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2103
  • 21:21 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:21 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2103 to cirrussearch2103 - bking@cumin2002"
  • 21:20 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2103 to cirrussearch2103 - bking@cumin2002"
  • 21:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T391056)', diff saved to https://phabricator.wikimedia.org/P75071 and previous config saved to /var/cache/conftool/dbconfig/20250415-211152-fceratto.json
  • 21:07 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1180.eqiad.wmnet with OS bullseye
  • 21:05 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1180.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T391056)', diff saved to https://phabricator.wikimedia.org/P75070 and previous config saved to /var/cache/conftool/dbconfig/20250415-205427-fceratto.json
  • 20:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 20:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T391056)', diff saved to https://phabricator.wikimedia.org/P75069 and previous config saved to /var/cache/conftool/dbconfig/20250415-205416-fceratto.json
  • 20:53 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 20:53 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2103 to cirrussearch2103
  • 20:51 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1180.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • {{safesubst:SAL entry|1=20:48 jforrester@deploy1003: Started scap sync-world: Backport for FetchHandler: Disable on non-repo wikis (T392014), FetchHandler: Don't read from the DB in getParamSettings on non-repo wikis either (T392014), FetchHandler: Disable on non-repo wikis (T392014), [[gerrit:1136808|FetchHandler: Don't read from the DB in getParamSettings on non-repo wikis either (T392014)}}
  • 20:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P75068 and previous config saved to /var/cache/conftool/dbconfig/20250415-203909-fceratto.json
  • 20:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2082.codfw.wmnet with OS bullseye
  • 20:27 volans: uploaded spicerack_10.1.0 to apt.wikimedia.org bullseye-wikimedia
  • 20:24 jforrester@deploy1003: Finished scap sync-world: Backport for wikimaniawiki: fix add/remove groups (T389729) (duration: 21m 04s)
  • 20:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P75067 and previous config saved to /var/cache/conftool/dbconfig/20250415-202401-fceratto.json
  • 20:17 jforrester@deploy1003: robertsky, jforrester: Continuing with sync
  • 20:15 jforrester@deploy1003: robertsky, jforrester: Backport for wikimaniawiki: fix add/remove groups (T389729) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2082.codfw.wmnet with reason: host reimage
  • 20:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T391056)', diff saved to https://phabricator.wikimedia.org/P75066 and previous config saved to /var/cache/conftool/dbconfig/20250415-200855-fceratto.json
  • 20:07 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2082.codfw.wmnet with reason: host reimage
  • 20:03 jforrester@deploy1003: Started scap sync-world: Backport for wikimaniawiki: fix add/remove groups (T389729)
  • 19:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2082
  • 19:52 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2082
  • 19:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T391056)', diff saved to https://phabricator.wikimedia.org/P75065 and previous config saved to /var/cache/conftool/dbconfig/20250415-195157-fceratto.json
  • 19:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 19:51 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2082
  • 19:51 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2082.codfw.wmnet 87.32.192.10.in-addr.arpa 7.8.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 19:51 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2082.codfw.wmnet 87.32.192.10.in-addr.arpa 7.8.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 19:51 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:51 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2082 - bking@cumin2002"
  • 19:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T391056)', diff saved to https://phabricator.wikimedia.org/P75064 and previous config saved to /var/cache/conftool/dbconfig/20250415-195134-fceratto.json
  • 19:51 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2082 - bking@cumin2002"
  • 19:47 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 19:46 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2082
  • 19:46 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet
  • 19:46 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2082.codfw.wmnet with OS bullseye
  • 19:46 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2082.codfw.wmnet on all recursors
  • 19:46 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2082.codfw.wmnet on all recursors
  • 19:46 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2082 to cirrussearch2082
  • 19:45 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2082
  • 19:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P75063 and previous config saved to /var/cache/conftool/dbconfig/20250415-193627-fceratto.json
  • {{safesubst:SAL entry|1=19:35 jforrester@deploy1003: Finished scap sync-world: Backport for VE: Start setting wgVisualEditorMobileInsertMenu, default to off (T388604), VE: Set wgVisualEditorMobileInsertMenu true on Wikifunctions client wikis (T383145 T388604), [wikifunctionswiki] Enable Wikifunctions client mode (T383106), [[gerrit:1126662|[dagwiki] Enable Wikifunctions client mode (T383106)}}
  • 19:31 jforrester@deploy1003: jforrester: Continuing with sync
  • 19:25 jforrester@deploy1003: jforrester: Backport for VE: Start setting wgVisualEditorMobileInsertMenu, default to off (T388604), VE: Set wgVisualEditorMobileInsertMenu true on Wikifunctions client wikis (T383145 T388604), [wikifunctionswiki] Enable Wikifunctions client mode (T383106), [dagwiki] Enable Wikifunctions client mode (T383106) synced to t
  • 19:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P75062 and previous config saved to /var/cache/conftool/dbconfig/20250415-192120-fceratto.json
  • {{safesubst:SAL entry|1=19:14 jforrester@deploy1003: Started scap sync-world: Backport for VE: Start setting wgVisualEditorMobileInsertMenu, default to off (T388604), VE: Set wgVisualEditorMobileInsertMenu true on Wikifunctions client wikis (T383145 T388604), [wikifunctionswiki] Enable Wikifunctions client mode (T383106), [[gerrit:1126662|[dagwiki] Enable Wikifunctions client mode (T383106)]}}
  • 19:10 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2082
  • 19:10 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:10 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2082 to cirrussearch2082 - bking@cumin2002"
  • 19:10 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2082 to cirrussearch2082 - bking@cumin2002"
  • 19:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T391056)', diff saved to https://phabricator.wikimedia.org/P75061 and previous config saved to /var/cache/conftool/dbconfig/20250415-190613-fceratto.json
  • 19:05 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 19:05 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2082 to cirrussearch2082
  • 19:03 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
  • 18:50 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T391056)', diff saved to https://phabricator.wikimedia.org/P75060 and previous config saved to /var/cache/conftool/dbconfig/20250415-185000-fceratto.json
  • 18:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 18:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 18:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T391056)', diff saved to https://phabricator.wikimedia.org/P75059 and previous config saved to /var/cache/conftool/dbconfig/20250415-184921-fceratto.json
  • 18:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P75058 and previous config saved to /var/cache/conftool/dbconfig/20250415-183413-fceratto.json
  • 18:29 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.25 refs T386220
  • 18:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P75057 and previous config saved to /var/cache/conftool/dbconfig/20250415-181906-fceratto.json
  • 18:05 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
  • 18:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T391056)', diff saved to https://phabricator.wikimedia.org/P75056 and previous config saved to /var/cache/conftool/dbconfig/20250415-180400-fceratto.json
  • 18:01 sukhe: removing from reprepro -C component/nginx-ech libssl and openssl packages
  • 18:00 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_eqiad and A:cp
  • 17:57 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_eqiad and A:cp
  • 17:47 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T391056)', diff saved to https://phabricator.wikimedia.org/P75055 and previous config saved to /var/cache/conftool/dbconfig/20250415-174734-fceratto.json
  • 17:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 17:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 17:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T391056)', diff saved to https://phabricator.wikimedia.org/P75054 and previous config saved to /var/cache/conftool/dbconfig/20250415-174653-fceratto.json
  • 17:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P75053 and previous config saved to /var/cache/conftool/dbconfig/20250415-173146-fceratto.json
  • 17:24 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@f650091]: Pickup latest artifacts. T391280. (duration: 01m 08s)
  • 17:23 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@f650091]: Pickup latest artifacts. T391280.
  • 17:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P75052 and previous config saved to /var/cache/conftool/dbconfig/20250415-171639-fceratto.json
  • 17:14 sukhe@dns1004: END - running authdns-update
  • 17:11 sukhe@dns1004: START - running authdns-update
  • 17:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T391056)', diff saved to https://phabricator.wikimedia.org/P75051 and previous config saved to /var/cache/conftool/dbconfig/20250415-170132-fceratto.json
  • 16:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1253 (T391056)', diff saved to https://phabricator.wikimedia.org/P75050 and previous config saved to /var/cache/conftool/dbconfig/20250415-165922-fceratto.json
  • 16:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: Maintenance
  • 16:59 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
  • 16:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T391056)', diff saved to https://phabricator.wikimedia.org/P75049 and previous config saved to /var/cache/conftool/dbconfig/20250415-165859-fceratto.json
  • 16:58 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
  • 16:48 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
  • 16:46 sukhe@dns1004: END - running authdns-update
  • 16:46 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
  • 16:45 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2102.codfw.wmnet on all recursors
  • 16:45 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2102.codfw.wmnet on all recursors
  • 16:45 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2098.codfw.wmnet on all recursors
  • 16:45 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2098.codfw.wmnet on all recursors
  • 16:45 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from elastic2098 to cirrussearch2098
  • 16:45 bking@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P75048 and previous config saved to /var/cache/conftool/dbconfig/20250415-164350-fceratto.json
  • 16:43 sukhe@dns1004: START - running authdns-update
  • 16:42 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:42 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2098 to cirrussearch2098
  • 16:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P75047 and previous config saved to /var/cache/conftool/dbconfig/20250415-162842-fceratto.json
  • 16:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2059.codfw.wmnet with OS bullseye
  • 16:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T391056)', diff saved to https://phabricator.wikimedia.org/P75046 and previous config saved to /var/cache/conftool/dbconfig/20250415-161335-fceratto.json
  • 16:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2059.codfw.wmnet with reason: host reimage
  • 16:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2059.codfw.wmnet with reason: host reimage
  • 15:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T391056)', diff saved to https://phabricator.wikimedia.org/P75044 and previous config saved to /var/cache/conftool/dbconfig/20250415-155939-fceratto.json
  • 15:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 15:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T391056)', diff saved to https://phabricator.wikimedia.org/P75043 and previous config saved to /var/cache/conftool/dbconfig/20250415-155914-fceratto.json
  • 15:58 tappof@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add data::pdus to exports - tappof@cumin1002 - T387231"
  • 15:57 tappof@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add data::pdus to exports - tappof@cumin1002 - T387231"
  • 15:47 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2059
  • 15:47 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2059
  • 15:47 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2059
  • 15:47 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2059.codfw.wmnet 5.32.192.10.in-addr.arpa 5.0.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:47 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2059.codfw.wmnet 5.32.192.10.in-addr.arpa 5.0.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:47 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:47 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2059 - bking@cumin2002"
  • 15:47 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2059 - bking@cumin2002"
  • 15:45 ladsgroup@deploy1003: Finished scap sync-world: Backport for Revert^2 "Bump thumbnail steps to 95%" (duration: 21m 02s)
  • 15:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P75042 and previous config saved to /var/cache/conftool/dbconfig/20250415-154407-fceratto.json
  • 15:42 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 15:42 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2059
  • 15:42 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2059.codfw.wmnet with OS bullseye
  • 15:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2059.codfw.wmnet on all recursors
  • 15:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2059.codfw.wmnet on all recursors
  • 15:41 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2059 to cirrussearch2059
  • 15:41 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2059
  • 15:40 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2059
  • 15:40 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:40 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2059 to cirrussearch2059 - bking@cumin2002"
  • 15:39 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 15:36 ladsgroup@deploy1003: ladsgroup: Backport for Revert^2 "Bump thumbnail steps to 95%" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P75041 and previous config saved to /var/cache/conftool/dbconfig/20250415-152901-fceratto.json
  • 15:24 ladsgroup@deploy1003: Started scap sync-world: Backport for Revert^2 "Bump thumbnail steps to 95%"
  • 15:22 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2059 to cirrussearch2059 - bking@cumin2002"
  • 15:17 dzahn@deploy1003: Finished deploy [releng/jenkins-deploy@c274545] (releasing): T391590 (duration: 01m 14s)
  • 15:16 dzahn@deploy1003: Started deploy [releng/jenkins-deploy@c274545] (releasing): T391590
  • 15:16 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 15:16 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2059 to cirrussearch2059
  • 15:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T391056)', diff saved to https://phabricator.wikimedia.org/P75038 and previous config saved to /var/cache/conftool/dbconfig/20250415-151354-fceratto.json
  • 15:11 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T391056)', diff saved to https://phabricator.wikimedia.org/P75037 and previous config saved to /var/cache/conftool/dbconfig/20250415-151144-fceratto.json
  • 15:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 15:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T391056)', diff saved to https://phabricator.wikimedia.org/P75036 and previous config saved to /var/cache/conftool/dbconfig/20250415-151121-fceratto.json
  • 14:57 sbassett: Undeployed security patch for T391343 (reapplied during recent scap backport, patch now removed from deployment hosts)
  • 14:57 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 14:57 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 14:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P75035 and previous config saved to /var/cache/conftool/dbconfig/20250415-145613-fceratto.json
  • 14:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P75034 and previous config saved to /var/cache/conftool/dbconfig/20250415-144106-fceratto.json
  • 14:40 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
  • 14:39 cgoubert@deploy1003: Finished scap sync-world: Backport for shwiktionary: Add bs as import source (T391621) (duration: 19m 28s)
  • 14:39 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
  • 14:38 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
  • 14:38 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_drmrs
  • 14:33 cgoubert@deploy1003: aleksandar, cgoubert: Continuing with sync
  • 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_drmrs
  • 14:31 cgoubert@deploy1003: aleksandar, cgoubert: Backport for shwiktionary: Add bs as import source (T391621) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T391056)', diff saved to https://phabricator.wikimedia.org/P75033 and previous config saved to /var/cache/conftool/dbconfig/20250415-142558-fceratto.json
  • 14:25 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_eqiad and A:cp
  • 14:25 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_eqiad and A:cp
  • 14:25 vgutierrez: rolling upgrade to varnish 7.1.1-1.1~bpo11+wmf3 in eqiad - T391334
  • 14:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T391056)', diff saved to https://phabricator.wikimedia.org/P75032 and previous config saved to /var/cache/conftool/dbconfig/20250415-142349-fceratto.json
  • 14:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 14:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T391056)', diff saved to https://phabricator.wikimedia.org/P75031 and previous config saved to /var/cache/conftool/dbconfig/20250415-142327-fceratto.json
  • 14:22 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
  • 14:22 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2065.codfw.wmnet with OS bullseye
  • 14:20 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_esams and not P{cp3073.esams.wmnet} and A:cp
  • 14:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:20 cgoubert@deploy1003: Started scap sync-world: Backport for shwiktionary: Add bs as import source (T391621)
  • 14:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:18 cgoubert@deploy1003: Finished scap sync-world: Backport for tests(Mentorship): add coverage for UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): extract sub-queries from UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): batch filtering mentees in UncachedMenteeOverviewDataProvider (T391695) (duration: 18m 30s)
  • 14:15 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_esams and not P{cp3081.esams.wmnet} and A:cp
  • 14:11 cgoubert@deploy1003: migr, cgoubert: Continuing with sync
  • 14:11 cgoubert@deploy1003: migr, cgoubert: Backport for tests(Mentorship): add coverage for UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): extract sub-queries from UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): batch filtering mentees in UncachedMenteeOverviewDataProvider (T391695) synced to the testservers (https://wikitech.wikimedia
  • 14:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P75030 and previous config saved to /var/cache/conftool/dbconfig/20250415-140820-fceratto.json
  • 14:07 urandom: bootstrapping Cassandra/restbase1044-c — T389423
  • 14:04 sukhe@dns1004: END - running authdns-update
  • 14:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2065.codfw.wmnet with reason: host reimage
  • 14:01 sukhe@dns1004: START - running authdns-update
  • 13:59 cgoubert@deploy1003: Started scap sync-world: Backport for tests(Mentorship): add coverage for UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): extract sub-queries from UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): batch filtering mentees in UncachedMenteeOverviewDataProvider (T391695)
  • 13:59 sukhe@dns1004: END - running authdns-update
  • 13:56 sukhe@dns1004: START - running authdns-update
  • 13:56 cgoubert@deploy1003: Finished scap sync-world: Backport for tests(Mentorship): add coverage for UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): extract sub-queries from UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): batch filtering mentees in UncachedMenteeOverviewDataProvider (T391695) (duration: 18m 27s)
  • 13:55 tappof@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add data::pdus to exports - tappof@cumin1002 - T387231"
  • 13:55 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2065.codfw.wmnet with reason: host reimage
  • 13:55 tappof@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add data::pdus to exports - tappof@cumin1002 - T387231"
  • 13:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P75029 and previous config saved to /var/cache/conftool/dbconfig/20250415-135313-fceratto.json
  • 13:49 cgoubert@deploy1003: migr, cgoubert: Continuing with sync
  • 13:49 cgoubert@deploy1003: migr, cgoubert: Backport for tests(Mentorship): add coverage for UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): extract sub-queries from UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): batch filtering mentees in UncachedMenteeOverviewDataProvider (T391695) synced to the testservers (https://wikitech.wikimedia
  • 13:45 tappof@cumin1002: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "add data::pdus to exports - tappof@cumin1002 - T387231"
  • 13:45 tappof@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add data::pdus to exports - tappof@cumin1002 - T387231"
  • 13:40 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2065
  • 13:40 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2065
  • 13:40 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2065
  • 13:40 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2065.codfw.wmnet 68.32.192.10.in-addr.arpa 8.6.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:40 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2065.codfw.wmnet 68.32.192.10.in-addr.arpa 8.6.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:40 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:40 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2065 - bking@cumin2002"
  • 13:40 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2065 - bking@cumin2002"
  • 13:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T391056)', diff saved to https://phabricator.wikimedia.org/P75028 and previous config saved to /var/cache/conftool/dbconfig/20250415-133807-fceratto.json
  • 13:38 cgoubert@deploy1003: Started scap sync-world: Backport for tests(Mentorship): add coverage for UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): extract sub-queries from UncachedMenteeOverviewDataProvider (T391695), perf(Mentorship): batch filtering mentees in UncachedMenteeOverviewDataProvider (T391695)
  • 13:36 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T391056)', diff saved to https://phabricator.wikimedia.org/P75027 and previous config saved to /var/cache/conftool/dbconfig/20250415-133558-fceratto.json
  • 13:35 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 13:35 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 13:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T391056)', diff saved to https://phabricator.wikimedia.org/P75025 and previous config saved to /var/cache/conftool/dbconfig/20250415-133536-fceratto.json
  • 13:34 cgoubert@deploy1003: Finished scap sync-world: Backport for updating wikimaniawiki namespace configurations: (T389729), update wikimaniawiki perms configurations: (T389729) (duration: 28m 46s)
  • 13:30 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2065
  • 13:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2065.codfw.wmnet with OS bullseye
  • 13:29 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2065.codfw.wmnet on all recursors
  • 13:29 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2065.codfw.wmnet on all recursors
  • 13:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2065 to cirrussearch2065
  • 13:28 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2065
  • 13:28 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2065
  • 13:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:28 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2065 to cirrussearch2065 - bking@cumin2002"
  • 13:28 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2065 to cirrussearch2065 - bking@cumin2002"
  • 13:25 cgoubert@deploy1003: cgoubert, robertsky: Continuing with sync
  • 13:23 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 13:23 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2065 to cirrussearch2065
  • 13:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P75024 and previous config saved to /var/cache/conftool/dbconfig/20250415-132029-fceratto.json
  • 13:17 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row C - bking@cumin2002 - T388610
  • 13:17 cgoubert@deploy1003: cgoubert, robertsky: Backport for updating wikimaniawiki namespace configurations: (T389729), update wikimaniawiki perms configurations: (T389729) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:16 sukhe@dns1004: END - running authdns-update
  • 13:14 slyngshede@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Andy Cooper out of all services on: 2393 hosts
  • 13:13 sukhe@dns1004: START - running authdns-update
  • 13:11 sukhe@dns1004: END - running authdns-update
  • 13:09 sukhe@dns1004: START - running authdns-update
  • 13:05 cgoubert@deploy1003: Started scap sync-world: Backport for updating wikimaniawiki namespace configurations: (T389729), update wikimaniawiki perms configurations: (T389729)
  • 13:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P75023 and previous config saved to /var/cache/conftool/dbconfig/20250415-130522-fceratto.json
  • 13:02 cgoubert@deploy1003: Finished scap sync-world: test rebuild to test swift eventual consistency (duration: 30m 09s)
  • 13:02 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 13:02 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 13:02 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 13:02 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 13:02 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 13:01 jelto@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 12:55 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 12:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 12:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T391056)', diff saved to https://phabricator.wikimedia.org/P75022 and previous config saved to /var/cache/conftool/dbconfig/20250415-125014-fceratto.json
  • 12:49 cgoubert@deploy1003: cgoubert: Continuing with sync
  • 12:49 cgoubert@deploy1003: cgoubert: test rebuild to test swift eventual consistency synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T391056)', diff saved to https://phabricator.wikimedia.org/P75021 and previous config saved to /var/cache/conftool/dbconfig/20250415-124805-fceratto.json
  • 12:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 12:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T391056)', diff saved to https://phabricator.wikimedia.org/P75020 and previous config saved to /var/cache/conftool/dbconfig/20250415-124743-fceratto.json
  • 12:42 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for durum2002.codfw.wmnet
  • 12:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for durum2002.codfw.wmnet
  • 12:33 slyngshede@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Andy Cooper out of all services on: 2393 hosts
  • 12:33 cgoubert@deploy1003: Started scap sync-world: test rebuild to test swift eventual consistency
  • 12:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P75018 and previous config saved to /var/cache/conftool/dbconfig/20250415-123236-fceratto.json
  • 12:32 cgoubert@deploy1003: Finished scap build-images: (no justification provided) (duration: 05m 27s)
  • 12:26 cgoubert@deploy1003: Started scap build-images: (no justification provided)
  • 12:26 cgoubert@deploy1003: build-images aborted: (no justification provided) (duration: 00m 01s)
  • 12:26 cgoubert@deploy1003: Started scap build-images: (no justification provided)
  • 12:26 cgoubert@deploy1003: build-images aborted: (no justification provided) (duration: 01m 12s)
  • 12:25 cgoubert@deploy1003: Started scap build-images: (no justification provided)
  • 12:21 godog: upgrade thanos to 0.38.0 on O:prometheus::pop - T383966
  • 12:20 godog: upgrade thanos to 0.38.0 on O:prometheus::pop
  • 12:20 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum2002.codfw.wmnet with OS bookworm
  • 12:19 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:18 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P75017 and previous config saved to /var/cache/conftool/dbconfig/20250415-121728-fceratto.json
  • 12:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T391056)', diff saved to https://phabricator.wikimedia.org/P75016 and previous config saved to /var/cache/conftool/dbconfig/20250415-120222-fceratto.json
  • 12:01 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
  • 12:00 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T391056)', diff saved to https://phabricator.wikimedia.org/P75015 and previous config saved to /var/cache/conftool/dbconfig/20250415-120013-fceratto.json
  • 12:00 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:58 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
  • 11:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:45 sukhe: sudo cumin 'A:durum and not P{durum2002*}' 'run-puppet-agent --enable "rolling out CR 1132669"'
  • 11:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T391056)', diff saved to https://phabricator.wikimedia.org/P75014 and previous config saved to /var/cache/conftool/dbconfig/20250415-114501-fceratto.json
  • 11:42 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host durum2002.codfw.wmnet with OS bookworm
  • 11:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P75013 and previous config saved to /var/cache/conftool/dbconfig/20250415-112955-fceratto.json
  • 11:25 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 11:25 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 11:25 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 11:25 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 11:24 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 11:24 jelto@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 11:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P75012 and previous config saved to /var/cache/conftool/dbconfig/20250415-111447-fceratto.json
  • 11:08 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_esams and not P{cp3081.esams.wmnet} and A:cp
  • 11:08 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_esams and not P{cp3073.esams.wmnet} and A:cp
  • 11:07 vgutierrez: rolling upgrade to varnish 7.1.1-1.1~bpo11+wmf3 in esams - T391334
  • 11:07 cgoubert@deploy1003: Started scap sync-world: test rebuild to look at logs
  • 11:07 sukhe@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on durum2002.codfw.wmnet with reason: testing
  • 11:05 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp[5023-5024].eqsin.wmnet} and A:cp
  • 10:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T391056)', diff saved to https://phabricator.wikimedia.org/P75011 and previous config saved to /var/cache/conftool/dbconfig/20250415-105941-fceratto.json
  • 10:58 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_eqsin
  • 10:52 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_drmrs
  • 10:52 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_drmrs
  • 10:52 vgutierrez: rolling upgrade to varnish 7.1.1-1.1~bpo11+wmf3 in drmrs - T391334
  • 10:42 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T391056)', diff saved to https://phabricator.wikimedia.org/P75010 and previous config saved to /var/cache/conftool/dbconfig/20250415-104235-fceratto.json
  • 10:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T391056)', diff saved to https://phabricator.wikimedia.org/P75009 and previous config saved to /var/cache/conftool/dbconfig/20250415-104212-fceratto.json
  • 10:41 sukhe: enable puppet on durum2002
  • 10:40 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_codfw
  • 10:39 ladsgroup@deploy1003: sync-world aborted: Backport for Bump thumbnail steps to 95% (T360589) (duration: 05m 08s)
  • 10:38 sukhe: sudo cumin 'A:durum' 'disable-puppet "rolling out CR 1132669"'
  • 10:37 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_codfw
  • 10:34 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 95% (T360589)
  • 10:33 ladsgroup@deploy1003: sync-world aborted: Backport for Bump thumbnail steps to 95% (T360589) (duration: 14m 11s)
  • 10:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P75008 and previous config saved to /var/cache/conftool/dbconfig/20250415-102705-fceratto.json
  • 10:26 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp[5023-5024].eqsin.wmnet} and A:cp
  • 10:24 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=1) rolling upgrade of Varnish on A:cp-text_eqsin
  • 10:19 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 95% (T360589)
  • 10:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P75007 and previous config saved to /var/cache/conftool/dbconfig/20250415-101158-fceratto.json
  • 10:00 dcausse@deploy1003: Finished deploy [wdqs/wdqs@fe88851] (wcqs): version 0.3.156 (duration: 02m 25s)
  • 09:58 dcausse@deploy1003: Started deploy [wdqs/wdqs@fe88851] (wcqs): version 0.3.156
  • 09:57 dcausse@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: version 0.3.156 (T326311) (duration: 14m 31s)
  • 09:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T391056)', diff saved to https://phabricator.wikimedia.org/P75006 and previous config saved to /var/cache/conftool/dbconfig/20250415-095650-fceratto.json
  • 09:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T391056)', diff saved to https://phabricator.wikimedia.org/P75005 and previous config saved to /var/cache/conftool/dbconfig/20250415-095442-fceratto.json
  • 09:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 09:43 dcausse@deploy1003: Started deploy [wdqs/wdqs@fe88851]: version 0.3.156 (T326311)
  • 09:15 jnuche@deploy1003: sync-world aborted: testwikis to 1.44.0-wmf.25 refs T386220 (duration: 14m 36s)
  • 09:01 jnuche@deploy1003: Started scap sync-world: testwikis to 1.44.0-wmf.25 refs T386220
  • 08:51 dcausse@deploy1003: Finished deploy [wdqs/wdqs@4186ae7] (wcqs): test deploy new scap config to wcqs2001.codfw.wmnet (T221709) (duration: 00m 20s)
  • 08:51 dcausse@deploy1003: Started deploy [wdqs/wdqs@4186ae7] (wcqs): test deploy new scap config to wcqs2001.codfw.wmnet (T221709)
  • 08:42 XioNoX: drain arelion eqsin-codfw link
  • 08:09 dcausse@deploy1003: Finished deploy [wdqs/wdqs@4186ae7]: test deploy new scap config to wdqs2025.codfw.wmnet (T221709) (duration: 00m 18s)
  • 08:09 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 08:09 dcausse@deploy1003: Started deploy [wdqs/wdqs@4186ae7]: test deploy new scap config to wdqs2025.codfw.wmnet (T221709)
  • 08:08 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 07:47 godog: upgrade thanos to 0.38.0 on prometheus100[57] - T383966
  • 07:28 Emperor: make sure all disks are mounted correctly prior to disk-swap testing T391854 ms-be1091
  • 07:28 Emperor: make sure all disks are mounted correctly prior to disk-swap testing T391854
  • 07:10 elukey@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ms-be1091.eqiad.wmnet with reason: dcops maintenance
  • 07:06 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_codfw
  • 07:06 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_codfw
  • 07:06 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_eqsin
  • 07:05 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 07:05 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_eqsin
  • 07:04 vgutierrez: rolling upgrade to varnish 7.1.1-1.1~bpo11+wmf3 in eqsin and codfw - T391334
  • 06:50 kartik@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 06:49 kart_: Updated cxserver to 2025-04-07-053106-production (T390732, T390711)
  • 06:48 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:47 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:46 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:45 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:45 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:44 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc6 T391454', diff saved to https://phabricator.wikimedia.org/P75003 and previous config saved to /var/cache/conftool/dbconfig/20250415-050307-marostegui.json
  • 04:57 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2016.codfw.wmnet,pc1016.eqiad.wmnet with reason: Maintenance
  • 04:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc6 T391454', diff saved to https://phabricator.wikimedia.org/P75002 and previous config saved to /var/cache/conftool/dbconfig/20250415-045700-marostegui.json
  • 04:10 mwpresync@deploy1003: Pruned MediaWiki: 1.44.0-wmf.22 (duration: 10m 03s)
  • 03:43 mwpresync@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.44.0-wmf.24,1.44.0-wmf.25 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discov
  • 03:02 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.44.0-wmf.25 refs T386220
  • 02:32 ejegg: payments-wiki upgraded from ef9284aa to ba6e8d65
  • 02:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1181.eqiad.wmnet with OS bullseye
  • 01:32 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1181.eqiad.wmnet with OS bullseye
  • 01:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1181']
  • 01:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1181']
  • 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 01:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL

2025-04-14

  • 23:22 urandom: bootstrapping Cassandra/restbase1044-b — T389423
  • 23:12 zabe: zabe@mwmaint1002:~$ cat group2.dblist | xargs -I{} bash -c "echo {}; mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php {} --delete /home/zabe/afl_text_table_deletedump/{} --sleep 0.3" # T381599
  • 22:44 ladsgroup@dns1004: END - running authdns-update
  • 22:42 ladsgroup@dns1004: START - running authdns-update
  • 22:34 mutante: deploy1003 - scap install-world -l release2003.codfw.wmnet T391590
  • 22:34 dzahn@deploy1003: Installation of scap version "4.153.0" completed for 1 hosts
  • 22:33 dzahn@deploy1003: Installing scap version "4.153.0" for 1 host(s)
  • 22:30 sbassett: Deployed previous good versions of affected files for T391343
  • 22:25 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2239.codfw.wmnet with reason: Maintenance
  • 22:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T391056)', diff saved to https://phabricator.wikimedia.org/P75001 and previous config saved to /var/cache/conftool/dbconfig/20250414-222519-fceratto.json
  • 22:20 sbassett: Deployment of security patch for T391343 halted
  • 22:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P75000 and previous config saved to /var/cache/conftool/dbconfig/20250414-221012-fceratto.json
  • 22:06 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch2060.codfw.wmnet|cirrussearch2067.codfw.wmnet|cirrussearch2068.codfw.wmnet|cirrussearch2072.codfw.wmnet|cirrussearch2085.codfw.wmnet|cirrussearch2104.codfw.wmnet|cirrussearch2105.codfw.wmnet|cirrussearch2107.codfw.wmnet|cirrussearch2109.codfw.wmnet|cirrussearch2114.codfw.wmnet|cirrussearch2115.codfw.wmnet
  • 21:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P74999 and previous config saved to /var/cache/conftool/dbconfig/20250414-215504-fceratto.json
  • 21:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T391056)', diff saved to https://phabricator.wikimedia.org/P74998 and previous config saved to /var/cache/conftool/dbconfig/20250414-213957-fceratto.json
  • 21:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2227 (T391056)', diff saved to https://phabricator.wikimedia.org/P74997 and previous config saved to /var/cache/conftool/dbconfig/20250414-212344-fceratto.json
  • 21:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2227.codfw.wmnet with reason: Maintenance
  • 21:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T391056)', diff saved to https://phabricator.wikimedia.org/P74996 and previous config saved to /var/cache/conftool/dbconfig/20250414-212320-fceratto.json
  • 21:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P74995 and previous config saved to /var/cache/conftool/dbconfig/20250414-210814-fceratto.json
  • 20:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P74994 and previous config saved to /var/cache/conftool/dbconfig/20250414-205307-fceratto.json
  • 20:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T391056)', diff saved to https://phabricator.wikimedia.org/P74993 and previous config saved to /var/cache/conftool/dbconfig/20250414-203800-fceratto.json
  • 20:23 jforrester@deploy1003: Finished scap sync-world: Backport for FunctionCalls: Use base64url encoding rather than raw base64 (T391584), FunctionCalls: Don't error if Wikifunctions.org isn't in client mode yet (T391584), FunctionCalls: Throw an explicable error if json_encode returns null (T391584) (duration: 14m 20s)
  • 20:21 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T391056)', diff saved to https://phabricator.wikimedia.org/P74992 and previous config saved to /var/cache/conftool/dbconfig/20250414-202152-fceratto.json
  • 20:21 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 20:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T391056)', diff saved to https://phabricator.wikimedia.org/P74991 and previous config saved to /var/cache/conftool/dbconfig/20250414-202131-fceratto.json
  • 20:17 jforrester@deploy1003: jforrester: Continuing with sync
  • 20:14 jforrester@deploy1003: jforrester: Backport for FunctionCalls: Use base64url encoding rather than raw base64 (T391584), FunctionCalls: Don't error if Wikifunctions.org isn't in client mode yet (T391584), FunctionCalls: Throw an explicable error if json_encode returns null (T391584) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:09 jforrester@deploy1003: Started scap sync-world: Backport for FunctionCalls: Use base64url encoding rather than raw base64 (T391584), FunctionCalls: Don't error if Wikifunctions.org isn't in client mode yet (T391584), FunctionCalls: Throw an explicable error if json_encode returns null (T391584)
  • 20:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P74990 and previous config saved to /var/cache/conftool/dbconfig/20250414-200624-fceratto.json
  • 20:02 mforns@deploy1003: Finished deploy [analytics/refinery@6fe5a7e] (thin): Regular analytics weekly train THIN [analytics/refinery@6fe5a7e3] (duration: 01m 09s)
  • 20:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2109.codfw.wmnet with OS bullseye
  • 20:01 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row D - bking@cumin2002 - T388610
  • 20:01 mforns@deploy1003: Started deploy [analytics/refinery@6fe5a7e] (thin): Regular analytics weekly train THIN [analytics/refinery@6fe5a7e3]
  • 20:00 mforns@deploy1003: Finished deploy [analytics/refinery@6fe5a7e]: Regular analytics weekly train [analytics/refinery@6fe5a7e3] (duration: 03m 31s)
  • 19:57 mforns@deploy1003: Started deploy [analytics/refinery@6fe5a7e]: Regular analytics weekly train [analytics/refinery@6fe5a7e3]
  • 19:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P74989 and previous config saved to /var/cache/conftool/dbconfig/20250414-195117-fceratto.json
  • 19:50 mforns@deploy1003: Finished deploy [analytics/refinery@6fe5a7e] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6fe5a7e3] (duration: 02m 44s)
  • 19:47 mforns@deploy1003: Started deploy [analytics/refinery@6fe5a7e] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6fe5a7e3]
  • 19:40 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2109.codfw.wmnet with reason: host reimage
  • 19:36 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2109.codfw.wmnet with reason: host reimage
  • 19:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T391056)', diff saved to https://phabricator.wikimedia.org/P74988 and previous config saved to /var/cache/conftool/dbconfig/20250414-193610-fceratto.json
  • 19:35 mforns@deploy1003: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
  • 19:35 mforns@deploy1003: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
  • 19:35 mforns@deploy1003: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
  • 19:34 mforns@deploy1003: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
  • 19:31 urandom: dropped & recreated 8 commons impact metrics tables — https://phabricator.wikimedia.org/T370470#10687053
  • 19:24 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 19:24 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 19:24 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 19:23 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 19:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2109
  • 19:20 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2109
  • 19:19 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T391056)', diff saved to https://phabricator.wikimedia.org/P74987 and previous config saved to /var/cache/conftool/dbconfig/20250414-191957-fceratto.json
  • 19:19 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 19:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T391056)', diff saved to https://phabricator.wikimedia.org/P74986 and previous config saved to /var/cache/conftool/dbconfig/20250414-191933-fceratto.json
  • 19:17 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2109
  • 19:17 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2109.codfw.wmnet 160.48.192.10.in-addr.arpa 0.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 19:17 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2109.codfw.wmnet 160.48.192.10.in-addr.arpa 0.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 19:17 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:17 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2109 - bking@cumin2002"
  • 19:17 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2109 - bking@cumin2002"
  • 19:13 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 19:10 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2109
  • 19:10 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2109.codfw.wmnet with OS bullseye
  • 19:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2109 to cirrussearch2109
  • 19:07 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2109
  • 19:07 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2109
  • 19:07 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:07 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2109 to cirrussearch2109 - bking@cumin2002"
  • 19:07 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2109 to cirrussearch2109 - bking@cumin2002"
  • 19:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P74985 and previous config saved to /var/cache/conftool/dbconfig/20250414-190426-fceratto.json
  • 19:02 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 19:02 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2109 to cirrussearch2109
  • 18:55 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row D - bking@cumin2002 - T388610
  • 18:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P74984 and previous config saved to /var/cache/conftool/dbconfig/20250414-184918-fceratto.json
  • 18:37 jforrester@deploy1003: Finished scap sync-world: Backport for Complete our RecentChanges entry generation and formatting (T386020), Switch test Wikifunctions client deployment from test2wiki to test2iki (T391584), Document Wikifunctions options, adding wgWikiLambdaClientModeOffline (T391584) (duration: 32m 25s)
  • 18:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T391056)', diff saved to https://phabricator.wikimedia.org/P74983 and previous config saved to /var/cache/conftool/dbconfig/20250414-183411-fceratto.json
  • 18:27 jforrester@deploy1003: jforrester: Continuing with sync
  • 18:27 James_F: Run `mwscript sql --wiki=testwiki /srv/mediawiki-staging/php-1.44.0-wmf.24/extensions/WikiLambda/sql/mysql/table-usage.sql` for T391885
  • 18:24 jforrester@deploy1003: jforrester: Backport for Complete our RecentChanges entry generation and formatting (T386020), Switch test Wikifunctions client deployment from test2wiki to test2iki (T391584), Document Wikifunctions options, adding wgWikiLambdaClientModeOffline (T391584) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T391056)', diff saved to https://phabricator.wikimedia.org/P74982 and previous config saved to /var/cache/conftool/dbconfig/20250414-181802-fceratto.json
  • 18:17 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 18:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T391056)', diff saved to https://phabricator.wikimedia.org/P74981 and previous config saved to /var/cache/conftool/dbconfig/20250414-181740-fceratto.json
  • 18:05 jforrester@deploy1003: Started scap sync-world: Backport for Complete our RecentChanges entry generation and formatting (T386020), Switch test Wikifunctions client deployment from test2wiki to test2iki (T391584), Document Wikifunctions options, adding wgWikiLambdaClientModeOffline (T391584)
  • 18:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P74980 and previous config saved to /var/cache/conftool/dbconfig/20250414-180232-fceratto.json
  • 17:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P74979 and previous config saved to /var/cache/conftool/dbconfig/20250414-174725-fceratto.json
  • 17:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T391056)', diff saved to https://phabricator.wikimedia.org/P74978 and previous config saved to /var/cache/conftool/dbconfig/20250414-173218-fceratto.json
  • 17:30 swfrench-wmf: running: cumin -b8 -s60 'A:cp-text' 'run-puppet-agent -e "merging ATS config change - T391421"'
  • 17:26 hashar@deploy1003: Finished deploy [integration/docroot@e92740c]: opensource: remove OOjs Router - T358813 (duration: 00m 10s)
  • 17:25 hashar@deploy1003: Started deploy [integration/docroot@e92740c]: opensource: remove OOjs Router - T358813
  • 17:25 swfrench-wmf: running: run-puppet-agent -e "merging ATS config change - T391421" on cp4040
  • 17:20 swfrench-wmf: running: cumin 'A:cp-text' 'disable-puppet "merging ATS config change - T391421"'
  • 17:17 swfrench@deploy1003: Finished scap sync-world: Backport for Remove PHP 8.1 migration WikimediaEvents settings (T391421) (duration: 13m 10s)
  • 17:16 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T391056)', diff saved to https://phabricator.wikimedia.org/P74977 and previous config saved to /var/cache/conftool/dbconfig/20250414-171622-fceratto.json
  • 17:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 17:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T391056)', diff saved to https://phabricator.wikimedia.org/P74976 and previous config saved to /var/cache/conftool/dbconfig/20250414-171558-fceratto.json
  • 17:10 swfrench@deploy1003: swfrench: Continuing with sync
  • 17:10 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1181.eqiad.wmnet with OS bullseye
  • 17:08 swfrench@deploy1003: swfrench: Backport for Remove PHP 8.1 migration WikimediaEvents settings (T391421) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:04 swfrench@deploy1003: Started scap sync-world: Backport for Remove PHP 8.1 migration WikimediaEvents settings (T391421)
  • 17:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P74975 and previous config saved to /var/cache/conftool/dbconfig/20250414-170052-fceratto.json
  • 16:56 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_magru
  • 16:56 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_magru
  • 16:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P74974 and previous config saved to /var/cache/conftool/dbconfig/20250414-164545-fceratto.json
  • 16:38 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1181.eqiad.wmnet with OS bullseye
  • 16:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T391056)', diff saved to https://phabricator.wikimedia.org/P74973 and previous config saved to /var/cache/conftool/dbconfig/20250414-163037-fceratto.json
  • 16:21 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1181.eqiad.wmnet with OS bullseye
  • 16:15 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T391056)', diff saved to https://phabricator.wikimedia.org/P74972 and previous config saved to /var/cache/conftool/dbconfig/20250414-161512-fceratto.json
  • 16:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 16:14 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 16:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T391056)', diff saved to https://phabricator.wikimedia.org/P74971 and previous config saved to /var/cache/conftool/dbconfig/20250414-161432-fceratto.json
  • 16:06 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1181.eqiad.wmnet with OS bullseye
  • 16:05 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 16:03 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_ulsfo and not P{cp4037.ulsfo.wmnet} and A:cp
  • 15:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P74970 and previous config saved to /var/cache/conftool/dbconfig/20250414-155925-fceratto.json
  • 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 15:57 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1181.eqiad.wmnet with OS bullseye
  • 15:56 fceratto@dns1004: END - running authdns-update
  • 15:53 fceratto@dns1004: START - running authdns-update
  • 15:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P74969 and previous config saved to /var/cache/conftool/dbconfig/20250414-154419-fceratto.json
  • 15:44 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:43 fceratto@cumin1002: START - Cookbook sre.dns.netbox
  • 15:40 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:37 urandom: bootstrapping Cassandra/restbase1044-a — T389423
  • 15:37 fceratto@cumin1002: START - Cookbook sre.dns.netbox
  • 15:33 eevans@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1044.eqiad.wmnet with reason: Bootstrapping — T389423
  • 15:30 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_ulsfo and not P{cp4047.ulsfo.wmnet} and not P{cp4045.ulsfo.wmnet} and A:cp
  • 15:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T391056)', diff saved to https://phabricator.wikimedia.org/P74968 and previous config saved to /var/cache/conftool/dbconfig/20250414-152911-fceratto.json
  • 15:26 volans: deployed homer v0.9.0 to cumin hosts
  • 15:25 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1181.eqiad.wmnet with OS bullseye
  • 15:25 volans@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.9.0 - volans@cumin1002
  • 15:24 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
  • 15:23 volans@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.9.0 - volans@cumin1002
  • 15:15 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
  • 15:13 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T391056)', diff saved to https://phabricator.wikimedia.org/P74967 and previous config saved to /var/cache/conftool/dbconfig/20250414-151316-fceratto.json
  • 15:13 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 15:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:02 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 15:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T391056)', diff saved to https://phabricator.wikimedia.org/P74966 and previous config saved to /var/cache/conftool/dbconfig/20250414-150200-fceratto.json
  • 14:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P74965 and previous config saved to /var/cache/conftool/dbconfig/20250414-144653-fceratto.json
  • 14:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P74964 and previous config saved to /var/cache/conftool/dbconfig/20250414-143146-fceratto.json
  • 14:26 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2104.codfw.wmnet with OS bullseye
  • 14:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T391056)', diff saved to https://phabricator.wikimedia.org/P74963 and previous config saved to /var/cache/conftool/dbconfig/20250414-141639-fceratto.json
  • 14:12 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T391056)', diff saved to https://phabricator.wikimedia.org/P74962 and previous config saved to /var/cache/conftool/dbconfig/20250414-141227-fceratto.json
  • 14:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 14:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T391056)', diff saved to https://phabricator.wikimedia.org/P74961 and previous config saved to /var/cache/conftool/dbconfig/20250414-141148-fceratto.json
  • 14:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2104.codfw.wmnet with reason: host reimage
  • 14:01 godog: temp disable "backend time" panel using unaggregated big mediawiki metric on "reading web performance" dashboard - T391677
  • 14:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2104.codfw.wmnet with reason: host reimage
  • 13:57 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1178.eqiad.wmnet with OS bullseye
  • 13:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P74960 and previous config saved to /var/cache/conftool/dbconfig/20250414-135640-fceratto.json
  • 13:47 arnaudb@cumin1002: END (ERROR) - Cookbook sre.gerrit.failover (exit_code=97) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 13:47 arnaudb@cumin1002: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 13:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P74956 and previous config saved to /var/cache/conftool/dbconfig/20250414-134132-fceratto.json
  • 13:41 TheresNoTime: UTC afternoon backport window done
  • 13:40 samtar@deploy1003: Finished scap sync-world: Backport for Enable SUL3 on most remaining beta cluster wikis, punjabiwikimedia, maiwikimedia: fix tagline (T348611) (duration: 12m 00s)
  • 13:38 sukhe: reprepro -C component/nginx-ech include bookworm-wikimedia nginx_1.22.1-9+deb12u1+ech2_amd64.changes: T205378
  • 13:33 samtar@deploy1003: matmarex, anzx, samtar: Continuing with sync
  • 13:33 samtar@deploy1003: matmarex, anzx, samtar: Backport for Enable SUL3 on most remaining beta cluster wikis, punjabiwikimedia, maiwikimedia: fix tagline (T348611) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2104
  • 13:30 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2104
  • 13:30 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2104.codfw.wmnet with OS bullseye
  • 13:28 samtar@deploy1003: Started scap sync-world: Backport for Enable SUL3 on most remaining beta cluster wikis, punjabiwikimedia, maiwikimedia: fix tagline (T348611)
  • 13:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from cirrussearch2014 to cirrussearch2104
  • 13:27 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2104
  • 13:27 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2104
  • 13:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:27 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming cirrussearch2014 to cirrussearch2104 - bking@cumin2002"
  • 13:26 samtar@deploy1003: Finished scap sync-world: Backport for CentralAuthTokenManager: Log failures for write operations (T390784) (duration: 11m 39s)
  • 13:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T391056)', diff saved to https://phabricator.wikimedia.org/P74955 and previous config saved to /var/cache/conftool/dbconfig/20250414-132625-fceratto.json
  • 13:23 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Compatibility with conftool 5.1.0 (take 2) - oblivian@cumin2002"
  • 13:23 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Compatibility with conftool 5.1.0 (take 2) - oblivian@cumin2002
  • 13:22 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Compatibility with conftool 5.1.0 (take 2) - oblivian@cumin2002
  • 13:22 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Compatibility with conftool 5.1.0 (take 2) - oblivian@cumin2002"
  • 13:22 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming cirrussearch2014 to cirrussearch2104 - bking@cumin2002"
  • 13:22 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T391056)', diff saved to https://phabricator.wikimedia.org/P74954 and previous config saved to /var/cache/conftool/dbconfig/20250414-132232-fceratto.json
  • 13:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T391056)', diff saved to https://phabricator.wikimedia.org/P74953 and previous config saved to /var/cache/conftool/dbconfig/20250414-132210-fceratto.json
  • 13:22 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Compatibility with conftool 5.1.0 - oblivian@cumin2002"
  • 13:22 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Compatibility with conftool 5.1.0 - oblivian@cumin2002
  • 13:21 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Compatibility with conftool 5.1.0 - oblivian@cumin2002
  • 13:21 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Compatibility with conftool 5.1.0 - oblivian@cumin2002"
  • 13:19 samtar@deploy1003: samtar, matmarex: Continuing with sync
  • 13:19 samtar@deploy1003: samtar, matmarex: Backport for CentralAuthTokenManager: Log failures for write operations (T390784) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:18 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 13:18 bking@cumin2002: START - Cookbook sre.hosts.rename from cirrussearch2014 to cirrussearch2104
  • 13:17 vgutierrez: rolling upgrade to varnish 7.1.1-1.1~bpo11+wmf3 in magru - T391334
  • 13:17 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_magru
  • 13:16 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_magru
  • 13:15 _joe_: installed updates to conftool on cumin hosts
  • 13:14 samtar@deploy1003: Started scap sync-world: Backport for CentralAuthTokenManager: Log failures for write operations (T390784)
  • 13:13 elukey@deploy1003: Finished deploy [docker-pkg/deploy@a555b7b]: Upgrade to 4.0.4 (duration: 00m 38s)
  • 13:13 elukey@deploy1003: Started deploy [docker-pkg/deploy@a555b7b]: Upgrade to 4.0.4
  • 13:13 godog: remove old LVs from prometheus[12]00[56] - T383232
  • 13:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P74952 and previous config saved to /var/cache/conftool/dbconfig/20250414-130703-fceratto.json
  • 13:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc5 T391454', diff saved to https://phabricator.wikimedia.org/P74951 and previous config saved to /var/cache/conftool/dbconfig/20250414-130222-marostegui.json
  • 13:01 moritzm: remove ganeti01.svc.eqiad.wmnet cert (replaced by cfssl cert) T357750
  • 12:56 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_ulsfo and not P{cp4037.ulsfo.wmnet} and A:cp
  • 12:56 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2015.codfw.wmnet,pc1015.eqiad.wmnet with reason: Maintenance
  • {{safesubst:SAL entry|1=12:56 jforrester@deploy1003: Finished scap sync-world: Backport for Special pages: Don't just set userCanExecute() but actually run it (T391594), Client mode: Provide WikiLambdaClientModeOffline for SRE to disable, Wikifunctions VE: Add loading and abort state to content editable (T391441), [[gerrit:1136126|logging: Allow through WikiLambdaClient logs at info level an}}
  • 12:56 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_ulsfo and not P{cp4047.ulsfo.wmnet} and not P{cp4045.ulsfo.wmnet} and A:cp
  • 12:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc5 T391454', diff saved to https://phabricator.wikimedia.org/P74950 and previous config saved to /var/cache/conftool/dbconfig/20250414-125511-marostegui.json
  • 12:53 moritzm: remove ganeti01.svc.codfw.wmnet cert (replaced by cfssl cert) T357750
  • 12:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P74949 and previous config saved to /var/cache/conftool/dbconfig/20250414-125156-fceratto.json
  • 12:51 godog: upgrade prometheus2007 to thanos 0.38.0 - T383966
  • 12:50 godog: upgrade prometheus2005 to thanos 0.38.0 - T383966
  • 12:49 moritzm: remove ganeti01.svc.esams.wmnet cert (replaced by cfssl cert) T357750
  • 12:46 jforrester@deploy1003: jforrester: Continuing with sync
  • 12:46 moritzm: remove ganeti01.svc.ulsfo.wmnet cert (replaced by cfssl cert) T357750
  • 12:44 jforrester@deploy1003: jforrester: Backport for Special pages: Don't just set userCanExecute() but actually run it (T391594), Client mode: Provide WikiLambdaClientModeOffline for SRE to disable, Wikifunctions VE: Add loading and abort state to content editable (T391441), logging: Allow through WikiLambdaClient logs at info level and above sync
  • 12:43 moritzm: remove ganeti01.svc.eqsin.wmnet cert (replaced by cfssl cert) T357750
  • {{safesubst:SAL entry|1=12:36 jforrester@deploy1003: Started scap sync-world: Backport for Special pages: Don't just set userCanExecute() but actually run it (T391594), Client mode: Provide WikiLambdaClientModeOffline for SRE to disable, Wikifunctions VE: Add loading and abort state to content editable (T391441), [[gerrit:1136126|logging: Allow through WikiLambdaClient logs at info level and}}
  • 12:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T391056)', diff saved to https://phabricator.wikimedia.org/P74948 and previous config saved to /var/cache/conftool/dbconfig/20250414-123649-fceratto.json
  • 12:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T391056)', diff saved to https://phabricator.wikimedia.org/P74947 and previous config saved to /var/cache/conftool/dbconfig/20250414-123255-fceratto.json
  • 12:32 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 12:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T391056)', diff saved to https://phabricator.wikimedia.org/P74946 and previous config saved to /var/cache/conftool/dbconfig/20250414-123234-fceratto.json
  • 12:25 cgoubert@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 12:24 cgoubert@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 12:24 cgoubert@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 12:23 cgoubert@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 12:22 cgoubert@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 12:22 cgoubert@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 12:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P74945 and previous config saved to /var/cache/conftool/dbconfig/20250414-121726-fceratto.json
  • {{safesubst:SAL entry|1=12:06 jforrester@deploy1003: Started scap sync-world: Backport for Special pages: Don't just set userCanExecute() but actually run it (T391594), Client mode: Provide WikiLambdaClientModeOffline for SRE to disable, Wikifunctions VE: Add loading and abort state to content editable (T391441), [[gerrit:1136126|logging: Allow through WikiLambdaClient logs at info level and}}
  • 12:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P74944 and previous config saved to /var/cache/conftool/dbconfig/20250414-120219-fceratto.json
  • 11:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T391056)', diff saved to https://phabricator.wikimedia.org/P74943 and previous config saved to /var/cache/conftool/dbconfig/20250414-114711-fceratto.json
  • 11:43 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T391056)', diff saved to https://phabricator.wikimedia.org/P74942 and previous config saved to /var/cache/conftool/dbconfig/20250414-114323-fceratto.json
  • 11:43 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 11:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T391056)', diff saved to https://phabricator.wikimedia.org/P74941 and previous config saved to /var/cache/conftool/dbconfig/20250414-114300-fceratto.json
  • 11:40 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox
  • 11:30 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp4045.ulsfo.wmnet} and A:cp
  • 11:30 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp4037.ulsfo.wmnet} and A:cp
  • 11:29 fceratto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:29 fceratto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:28 fceratto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 11:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P74940 and previous config saved to /var/cache/conftool/dbconfig/20250414-112754-fceratto.json
  • 11:27 fceratto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
  • 11:26 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:26 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:25 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp4045.ulsfo.wmnet} and A:cp
  • 11:25 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:25 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp4037.ulsfo.wmnet} and A:cp
  • 11:25 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:24 vgutierrez: upload varnishkafka 1.2.0-3 to apt.wm.o (bullseye-wikimedia) - T391334
  • 11:20 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:20 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:19 fceratto@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:19 fceratto@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P74939 and previous config saved to /var/cache/conftool/dbconfig/20250414-111247-fceratto.json
  • 11:12 moritzm: restart spamassassin on lists* to pick up Perl security updates
  • 10:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 90% (T360589), CommonSettings: remove outdated SecurePoll comment (T209892) (duration: 17m 26s)
  • 10:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T391056)', diff saved to https://phabricator.wikimedia.org/P74938 and previous config saved to /var/cache/conftool/dbconfig/20250414-105741-fceratto.json
  • 10:57 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
  • 10:53 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T391056)', diff saved to https://phabricator.wikimedia.org/P74937 and previous config saved to /var/cache/conftool/dbconfig/20250414-105351-fceratto.json
  • 10:53 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T391056)', diff saved to https://phabricator.wikimedia.org/P74936 and previous config saved to /var/cache/conftool/dbconfig/20250414-105329-fceratto.json
  • 10:53 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=1) rolling upgrade of Varnish on A:cp-text_ulsfo
  • 10:52 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=1) rolling upgrade of Varnish on A:cp-upload_ulsfo and not P{cp4047.ulsfo.wmnet} and A:cp
  • 10:49 ladsgroup@deploy1003: ladsgroup, novemlinguae: Continuing with sync
  • 10:48 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_ulsfo and not P{cp4047.ulsfo.wmnet} and A:cp
  • 10:48 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_ulsfo
  • 10:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74935 and previous config saved to /var/cache/conftool/dbconfig/20250414-104758-root.json
  • 10:47 ladsgroup@deploy1003: ladsgroup, novemlinguae: Backport for Bump thumbnail steps to 90% (T360589), CommonSettings: remove outdated SecurePoll comment (T209892) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:44 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
  • 10:43 vgutierrez: rolling upgrade to varnish 7.1.1-1..1~bpo11+wmf3 in ulsfo - T391334
  • 10:42 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox
  • 10:41 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 90% (T360589), CommonSettings: remove outdated SecurePoll comment (T209892)
  • 10:40 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 10:40 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 10:40 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 10:40 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 10:35 vgutierrez: upload varnish 7.1.1-1.1~bpo11+wmf3 to apt.wm.o (bullseye-wikimedia) - T391334
  • 10:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74933 and previous config saved to /var/cache/conftool/dbconfig/20250414-103253-root.json
  • 10:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P74932 and previous config saved to /var/cache/conftool/dbconfig/20250414-102316-fceratto.json
  • 10:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1178 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P74931 and previous config saved to /var/cache/conftool/dbconfig/20250414-101748-root.json
  • 10:15 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 90% (T360589), CommonSettings: remove outdated SecurePoll comment (T209892)
  • 10:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T391056)', diff saved to https://phabricator.wikimedia.org/P74930 and previous config saved to /var/cache/conftool/dbconfig/20250414-100809-fceratto.json
  • 10:04 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1157 (T391056)', diff saved to https://phabricator.wikimedia.org/P74929 and previous config saved to /var/cache/conftool/dbconfig/20250414-100412-fceratto.json
  • 10:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74928 and previous config saved to /var/cache/conftool/dbconfig/20250414-100242-root.json
  • 10:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc1', diff saved to https://phabricator.wikimedia.org/P74927 and previous config saved to /var/cache/conftool/dbconfig/20250414-100135-marostegui.json
  • 10:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc1', diff saved to https://phabricator.wikimedia.org/P74925 and previous config saved to /var/cache/conftool/dbconfig/20250414-100038-marostegui.json
  • 09:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 09:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1178 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P74924 and previous config saved to /var/cache/conftool/dbconfig/20250414-094737-root.json
  • 09:35 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2220 gradually with 4 steps - Finished upgrading host
  • 09:33 vgutierrez: restarting acme-chief API servers to catch up on liblzma updates
  • 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1178 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P74922 and previous config saved to /var/cache/conftool/dbconfig/20250414-093232-root.json
  • 09:31 vgutierrez: restarting acme-chief to catch up on liblzma updates
  • 09:21 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2230.codfw.wmnet
  • 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74919 and previous config saved to /var/cache/conftool/dbconfig/20250414-091727-root.json
  • 09:15 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2230.codfw.wmnet
  • 09:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1178 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P74917 and previous config saved to /var/cache/conftool/dbconfig/20250414-090222-root.json
  • 09:00 XioNoX: gnmic: bump `num-workers` to 16 on netflow1002 - T388641
  • 08:48 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2220 gradually with 4 steps - Finished upgrading host
  • 08:47 moritzm: installing Postgres 15 security updates
  • 08:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74914 and previous config saved to /var/cache/conftool/dbconfig/20250414-084716-root.json
  • 08:46 fabfur: enable-puppet on A:cp (T391670)
  • 08:45 moritzm: restart Postfix/Dovecot on outbound MXes to pick up xz security updates
  • 08:41 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1178.eqiad.wmnet with OS bullseye
  • 08:40 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1178.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 08:39 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
  • 08:39 moritzm: restarting ircstream on irc1003, clients will reconnect automatically
  • 08:39 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db2220.codfw.wmnet
  • 08:36 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
  • 08:35 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp1111.eqiad.wmnet
  • 08:34 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1178.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 08:32 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2220.codfw.wmnet
  • 08:31 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp1111.eqiad.wmnet
  • 08:31 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2220 - Upgrading host
  • 08:30 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2220 - Upgrading host
  • 08:27 fabfur: disable-puppet on A:cp to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/1135827 (T391670)
  • 08:26 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1178.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 08:25 vriley@cumin1002: START - Cookbook sre.hosts.provision for host db1178.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 08:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1178', diff saved to https://phabricator.wikimedia.org/P74912 and previous config saved to /var/cache/conftool/dbconfig/20250414-082235-marostegui.json
  • 08:20 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1178.eqiad.wmnet with OS bullseye
  • 08:11 moritzm: restarting clamav on vrts to pick up liblzma security updates
  • 07:58 moritzm: rebalance ganeti/B T391243
  • 07:53 XioNoX: gnmic: bump `num-workers` to 12 on netflow1002 - T388641
  • 07:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet
  • 07:42 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet
  • 07:39 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: sync
  • 07:37 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: sync
  • 07:37 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: sync
  • 07:36 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/proton: sync
  • 07:27 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: sync
  • 07:27 elukey@deploy1003: helmfile [staging] START helmfile.d/services/proton: sync
  • 07:26 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host db1178.eqiad.wmnet with OS bullseye
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc4 T391454', diff saved to https://phabricator.wikimedia.org/P74911 and previous config saved to /var/cache/conftool/dbconfig/20250414-072437-marostegui.json
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc4 T391454', diff saved to https://phabricator.wikimedia.org/P74910 and previous config saved to /var/cache/conftool/dbconfig/20250414-071653-marostegui.json
  • 07:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2014.codfw.wmnet,pc1014.eqiad.wmnet with reason: Maintenance
  • 07:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1044.eqiad.wmnet
  • 07:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1044.eqiad.wmnet
  • 07:04 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:04 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc3 T391454', diff saved to https://phabricator.wikimedia.org/P74909 and previous config saved to /var/cache/conftool/dbconfig/20250414-070220-marostegui.json
  • 07:01 moritzm: installing subversion security updates
  • 06:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet
  • 06:55 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2013.codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance
  • 06:54 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet
  • 06:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc3 T391454', diff saved to https://phabricator.wikimedia.org/P74908 and previous config saved to /var/cache/conftool/dbconfig/20250414-065203-marostegui.json
  • 06:51 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1043.eqiad.wmnet
  • 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1043.eqiad.wmnet
  • 06:41 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 06:15 moritzm: installing perl security updates
  • 06:12 _joe_: uploaded conftool 5.1.0

2025-04-12

  • 19:16 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 19:12 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:19 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:08 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 16:06 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 16:06 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:04 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:04 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:00 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply

2025-04-11

  • 23:02 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 23:01 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 22:55 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 22:55 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 21:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2014.codfw.wmnet with OS bullseye
  • 21:40 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2014.codfw.wmnet with reason: host reimage
  • 21:37 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2014.codfw.wmnet with reason: host reimage
  • 20:58 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from cirrussearch2014 to cirrussearch2104
  • 20:58 bking@cumin2002: START - Cookbook sre.hosts.rename from cirrussearch2014 to cirrussearch2104
  • 20:57 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2105.codfw.wmnet with OS bullseye
  • 20:41 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2014
  • 20:41 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2014
  • 20:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2105.codfw.wmnet with reason: host reimage
  • 20:35 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2014
  • 20:35 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2014.codfw.wmnet 69.48.192.10.in-addr.arpa 9.6.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 20:35 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2014.codfw.wmnet 69.48.192.10.in-addr.arpa 9.6.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 20:35 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:35 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2014 - bking@cumin2002"
  • 20:35 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2014 - bking@cumin2002"
  • 20:32 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2105.codfw.wmnet with reason: host reimage
  • 20:27 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 20:26 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2014
  • 20:25 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2014.codfw.wmnet with OS bullseye
  • 20:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2105
  • 20:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2105
  • 20:14 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2105
  • 20:14 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2105.codfw.wmnet 70.48.192.10.in-addr.arpa 0.7.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 20:14 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2105.codfw.wmnet 70.48.192.10.in-addr.arpa 0.7.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 20:14 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:14 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2105 - ryankemper@cumin2002"
  • 20:14 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2105 - ryankemper@cumin2002"
  • 20:13 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2104.codfw.wmnet on all recursors
  • 20:13 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2104.codfw.wmnet on all recursors
  • 20:11 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "fix typo (cirrussearch2014 should be cirrussearch2104) - bking@cumin2002 - T388610"
  • 20:11 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "fix typo (cirrussearch2014 should be cirrussearch2104) - bking@cumin2002 - T388610"
  • 20:06 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 20:06 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2105
  • 20:06 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2105.codfw.wmnet with OS bullseye
  • 19:58 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2105 to cirrussearch2105
  • 19:57 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2105
  • 19:57 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2105
  • 19:57 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:57 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2105 to cirrussearch2105 - ryankemper@cumin2002"
  • 19:57 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2105 to cirrussearch2105 - ryankemper@cumin2002"
  • 19:52 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2014.codfw.wmnet on all recursors
  • 19:52 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2014.codfw.wmnet on all recursors
  • 19:49 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2104.codfw.wmnet on all recursors
  • 19:49 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2104.codfw.wmnet on all recursors
  • 19:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2104 to cirrussearch2014
  • 19:48 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2014
  • 19:48 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2014
  • 19:48 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:48 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2104 to cirrussearch2014 - bking@cumin2002"
  • 19:45 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2104 to cirrussearch2014 - bking@cumin2002"
  • 19:45 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 19:44 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic2105 to cirrussearch2105
  • 19:40 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 19:39 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2104 to cirrussearch2014
  • 18:45 topranks: remove et-0/0/0 from ae0 LAG bundle on cr3-ulsfo and cr4-ulsfo T390731
  • 18:41 cmooney@dns2005: END - running authdns-update
  • 18:41 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:41 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns records for new separate routed link in ulsfo - cmooney@cumin1002"
  • 18:39 cmooney@dns2005: START - running authdns-update
  • 18:35 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns records for new separate routed link in ulsfo - cmooney@cumin1002"
  • 18:32 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 17:53 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:53 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add test server IP dns nokia lab - cmooney@cumin1002"
  • 17:53 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add test server IP dns nokia lab - cmooney@cumin1002"
  • 17:47 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 17:39 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2085.codfw.wmnet with OS bullseye
  • 17:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1007.eqiad.wmnet with OS bullseye
  • 17:37 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 17:37 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 17:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1007.eqiad.wmnet with reason: host reimage
  • 17:19 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1007.eqiad.wmnet with reason: host reimage
  • 17:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2085.codfw.wmnet with reason: host reimage
  • 17:15 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2085.codfw.wmnet with reason: host reimage
  • 17:08 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-druid1007.eqiad.wmnet with OS bullseye
  • 17:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2085
  • 17:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2085
  • 16:59 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2085
  • 16:59 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2085.codfw.wmnet 72.48.192.10.in-addr.arpa 2.7.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 16:59 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2085.codfw.wmnet 72.48.192.10.in-addr.arpa 2.7.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 16:59 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:59 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2085 - bking@cumin2002"
  • 16:59 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2085 - bking@cumin2002"
  • 16:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:54 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:49 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:48 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-druid1007
  • 16:48 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-druid1007
  • 16:47 bking@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:45 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-druid1007
  • 16:45 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-druid1007
  • 16:44 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:44 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2085
  • 16:44 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2085.codfw.wmnet with OS bullseye
  • 16:42 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cirrussearch2085.codfw.wmnet with OS bullseye
  • 16:42 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2085.codfw.wmnet with OS bullseye
  • 16:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:33 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2085.codfw.wmnet with OS bullseye
  • 16:33 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.move-vlan (exit_code=99) for host cirrussearch2085
  • 16:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:33 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2085
  • 16:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2085.codfw.wmnet with OS bullseye
  • 16:32 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2085.codfw.wmnet on all recursors
  • 16:32 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2085.codfw.wmnet on all recursors
  • 16:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2085 to cirrussearch2085
  • 16:28 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2085
  • 16:27 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2085
  • 16:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:27 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2085 to cirrussearch2085 - bking@cumin2002"
  • 16:27 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on 15 hosts with reason: reimaging/migrating hosts
  • 16:26 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2085 to cirrussearch2085 - bking@cumin2002"
  • 16:22 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:21 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2085 to cirrussearch2085
  • 16:11 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-druid1007
  • 16:11 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-druid1007
  • 16:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1013.eqiad.wmnet with OS bullseye
  • 16:09 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 16:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1012.eqiad.wmnet with OS bullseye
  • 16:08 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 16:01 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:52 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 15:48 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:48 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add test server IP dns nokia lab - cmooney@cumin1002"
  • 15:47 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add test server IP dns nokia lab - cmooney@cumin1002"
  • 15:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1013.eqiad.wmnet with reason: host reimage
  • 15:43 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 15:42 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1013.eqiad.wmnet with reason: host reimage
  • 15:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1012.eqiad.wmnet with reason: host reimage
  • 15:38 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1012.eqiad.wmnet with reason: host reimage
  • 15:37 sukhe: reprepro -C component/nginx-ech include bookworm-wikimedia nginx_1.22.1-9+deb12u1+ech1_amd64.changes: T205378
  • 15:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host druid1013.eqiad.wmnet with OS bullseye
  • 15:30 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host druid1012.eqiad.wmnet with OS bullseye
  • 15:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-druid1007.eqiad.wmnet with OS bullseye
  • 15:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2114.codfw.wmnet with OS bullseye
  • 15:23 sukhe: reprepro -C component/nginx-ech include bookworm-wikimedia openssl_3.4.1-1+ech3_amd64.changes: T205378
  • 15:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2142.codfw.wmnet
  • 15:23 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2142.codfw.wmnet
  • 15:22 jclark@cumin1002: START - Cookbook sre.hosts.provision for host druid1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1006.eqiad.wmnet with OS bullseye
  • 15:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:22 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host wikikube-worker2142.codfw.wmnet
  • 15:19 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-worker2142.codfw.wmnet
  • 15:19 claime: homer lsw1-c2-codfw* commit T391341
  • 15:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host druid1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:12 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:08 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1006.eqiad.wmnet with reason: host reimage
  • 15:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:05 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:05 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1006.eqiad.wmnet with reason: host reimage
  • 15:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2114.codfw.wmnet with reason: host reimage
  • 15:02 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:59 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2114.codfw.wmnet with reason: host reimage
  • 14:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-druid1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2142']
  • 14:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142']
  • 14:53 sukhe: reprepro -C component/nginx-ech remove bookworm-wikimedia libssl3t64: removing libssl3t* since we dropped support for 64-bit time
  • 14:52 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-druid1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:49 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on releases2003.codfw.wmnet with reason: Bookworm Re-image
  • 14:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2114
  • 14:43 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2114
  • 14:43 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2114.codfw.wmnet with OS bullseye
  • 14:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-druid1007.eqiad.wmnet with OS bullseye
  • 14:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-druid1006.eqiad.wmnet with OS bullseye
  • 13:33 sukhe: reprepro -C component/nginx-ech include bookworm-wikimedia openssl_3.4.1-1+ech2_amd64.changes: T205378
  • 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'Change weight for db1180 T390510', diff saved to https://phabricator.wikimedia.org/P74901 and previous config saved to /var/cache/conftool/dbconfig/20250411-132518-marostegui.json
  • 12:10 godog: bounce thanos-query thanos-query-frontend thanos-store on titan1*
  • 11:36 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1169.eqiad.wmnet
  • 11:29 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1169.eqiad.wmnet
  • 10:29 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1169.eqiad.wmnet
  • 10:22 btullis@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1169.eqiad.wmnet
  • 10:09 slyngshede@dns1004: END - running authdns-update
  • 10:07 slyngshede@dns1004: START - running authdns-update
  • 10:07 slyngshede@dns1004: START - running authdns-update
  • 09:56 slyngshede@dns1004: END - running authdns-update
  • 09:53 slyngshede@dns1004: START - running authdns-update
  • 09:53 slyngshede@dns1004: END - running authdns-update
  • 09:51 slyngshede@dns1004: START - running authdns-update
  • 09:46 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1169.eqiad.wmnet with OS bullseye
  • 09:24 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1169.eqiad.wmnet with reason: host reimage
  • 09:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:20 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1169.eqiad.wmnet with reason: host reimage
  • 09:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:05 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1169.eqiad.wmnet with OS bullseye
  • 09:05 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1169.eqiad.wmnet with OS bullseye
  • 08:44 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1169.eqiad.wmnet with OS bullseye
  • 08:44 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1169.eqiad.wmnet with OS bullseye
  • 07:50 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 07:46 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply

2025-04-10

  • 22:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T391056)', diff saved to https://phabricator.wikimedia.org/P74899 and previous config saved to /var/cache/conftool/dbconfig/20250410-223055-fceratto.json
  • 22:20 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch2055.codfw.wmnet|cirrussearch2056.codfw.wmnet|cirrussearch2062.codfw.wmnet|cirrussearch2068.codfw.wmnet|cirrussearch2069.codfw.wmnet|cirrussearch2074.codfw.wmnet|cirrussearch2075.codfw.wmnet|cirrussearch2087.codfw.wmnet|cirrussearch2088.codfw.wmnet|cirrussearch2089.codfw.wmnet|cirrussearch2090.codfw.wmnet|cirrussearch2091.codf
  • 22:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P74898 and previous config saved to /var/cache/conftool/dbconfig/20250410-221548-fceratto.json
  • 22:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P74897 and previous config saved to /var/cache/conftool/dbconfig/20250410-220040-fceratto.json
  • 21:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T391056)', diff saved to https://phabricator.wikimedia.org/P74896 and previous config saved to /var/cache/conftool/dbconfig/20250410-214533-fceratto.json
  • 21:42 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2228 (T391056)', diff saved to https://phabricator.wikimedia.org/P74894 and previous config saved to /var/cache/conftool/dbconfig/20250410-214205-fceratto.json
  • 21:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 21:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2228.codfw.wmnet with reason: Maintenance
  • 21:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T391056)', diff saved to https://phabricator.wikimedia.org/P74893 and previous config saved to /var/cache/conftool/dbconfig/20250410-214128-fceratto.json
  • 21:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P74892 and previous config saved to /var/cache/conftool/dbconfig/20250410-212621-fceratto.json
  • 21:16 bking@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 21:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P74891 and previous config saved to /var/cache/conftool/dbconfig/20250410-211114-fceratto.json
  • 20:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T391056)', diff saved to https://phabricator.wikimedia.org/P74890 and previous config saved to /var/cache/conftool/dbconfig/20250410-205606-fceratto.json
  • 20:52 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2223 (T391056)', diff saved to https://phabricator.wikimedia.org/P74889 and previous config saved to /var/cache/conftool/dbconfig/20250410-205211-fceratto.json
  • 20:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2223.codfw.wmnet with reason: Maintenance
  • 20:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T391056)', diff saved to https://phabricator.wikimedia.org/P74888 and previous config saved to /var/cache/conftool/dbconfig/20250410-205148-fceratto.json
  • 20:41 bking@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2091
  • 20:41 bking@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2091
  • 20:40 bking@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2091
  • 20:40 bking@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2091.codfw.wmnet 99.0.192.10.in-addr.arpa 9.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 20:40 bking@cumin1002: START - Cookbook sre.dns.wipe-cache cirrussearch2091.codfw.wmnet 99.0.192.10.in-addr.arpa 9.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 20:40 bking@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:40 bking@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2091 - bking@cumin1002"
  • 20:40 bking@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2091 - bking@cumin1002"
  • 20:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P74887 and previous config saved to /var/cache/conftool/dbconfig/20250410-203640-fceratto.json
  • 20:34 bking@cumin1002: START - Cookbook sre.dns.netbox
  • 20:34 bking@cumin1002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2091
  • 20:34 bking@cumin1002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 20:30 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 20:30 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 20:28 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 20:27 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 20:26 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 20:26 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 20:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2091 to cirrussearch2091
  • 20:24 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2091
  • 20:24 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2091
  • 20:24 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:24 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2091 to cirrussearch2091 - bking@cumin2002"
  • 20:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P74886 and previous config saved to /var/cache/conftool/dbconfig/20250410-202132-fceratto.json
  • 20:17 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2091 to cirrussearch2091 - bking@cumin2002"
  • 20:17 cdobbins@dns1004: END - running authdns-update
  • 20:15 cdobbins@dns1004: START - running authdns-update
  • 20:13 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 20:13 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2091 to cirrussearch2091
  • 20:09 cdobbins@dns1004: END - running authdns-update
  • 20:07 cdobbins@dns1004: START - running authdns-update
  • 20:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T391056)', diff saved to https://phabricator.wikimedia.org/P74885 and previous config saved to /var/cache/conftool/dbconfig/20250410-200625-fceratto.json
  • 20:02 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T391056)', diff saved to https://phabricator.wikimedia.org/P74884 and previous config saved to /var/cache/conftool/dbconfig/20250410-200233-fceratto.json
  • 20:02 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 20:00 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 20:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T391056)', diff saved to https://phabricator.wikimedia.org/P74883 and previous config saved to /var/cache/conftool/dbconfig/20250410-200022-fceratto.json
  • 19:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P74882 and previous config saved to /var/cache/conftool/dbconfig/20250410-194514-fceratto.json
  • 19:44 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2111.codfw.wmnet with OS bullseye
  • 19:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P74881 and previous config saved to /var/cache/conftool/dbconfig/20250410-193007-fceratto.json
  • 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2111.codfw.wmnet with reason: host reimage
  • 19:22 tzatziki: removing 2 files for legal compliance
  • 19:20 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2111.codfw.wmnet with reason: host reimage
  • 19:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T391056)', diff saved to https://phabricator.wikimedia.org/P74880 and previous config saved to /var/cache/conftool/dbconfig/20250410-191459-fceratto.json
  • 19:13 tzatziki: removing 1 file for legal compliance
  • 19:12 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T391056)', diff saved to https://phabricator.wikimedia.org/P74879 and previous config saved to /var/cache/conftool/dbconfig/20250410-191226-fceratto.json
  • 19:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 19:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T391056)', diff saved to https://phabricator.wikimedia.org/P74878 and previous config saved to /var/cache/conftool/dbconfig/20250410-191214-fceratto.json
  • 18:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2111
  • 18:58 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2111
  • 18:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2111.codfw.wmnet with OS bullseye
  • 18:57 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2111.codfw.wmnet with OS bullseye
  • 18:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P74877 and previous config saved to /var/cache/conftool/dbconfig/20250410-185706-fceratto.json
  • 18:45 jforrester@deploy1003: Finished scap sync-world: Backport for WikifunctionsClientUsageUpdateJob: Also init targetPageNamespace, Special pages: Don't list or let execute repo-only ones on client wikis (T391594), InitializeSettings: add wgSecurePollEditOtherWikis (T384302) (duration: 12m 42s)
  • 18:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P74875 and previous config saved to /var/cache/conftool/dbconfig/20250410-184159-fceratto.json
  • 18:38 jforrester@deploy1003: novemlinguae, jforrester: Continuing with sync
  • 18:37 jforrester@deploy1003: novemlinguae, jforrester: Backport for WikifunctionsClientUsageUpdateJob: Also init targetPageNamespace, Special pages: Don't list or let execute repo-only ones on client wikis (T391594), InitializeSettings: add wgSecurePollEditOtherWikis (T384302) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:33 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2111
  • 18:33 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2111
  • 18:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2111.codfw.wmnet with OS bullseye
  • 18:33 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cirrussearch2111']
  • 18:32 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2111']
  • 18:32 jforrester@deploy1003: Started scap sync-world: Backport for WikifunctionsClientUsageUpdateJob: Also init targetPageNamespace, Special pages: Don't list or let execute repo-only ones on client wikis (T391594), InitializeSettings: add wgSecurePollEditOtherWikis (T384302)
  • 18:31 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2111']
  • 18:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T391056)', diff saved to https://phabricator.wikimedia.org/P74873 and previous config saved to /var/cache/conftool/dbconfig/20250410-182652-fceratto.json
  • 18:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T391056)', diff saved to https://phabricator.wikimedia.org/P74872 and previous config saved to /var/cache/conftool/dbconfig/20250410-182319-fceratto.json
  • 18:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 18:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T391056)', diff saved to https://phabricator.wikimedia.org/P74871 and previous config saved to /var/cache/conftool/dbconfig/20250410-182257-fceratto.json
  • 18:21 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cirrussearch2111']
  • 18:20 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.24 refs T386219
  • 18:11 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2111']
  • 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cirrussearch2111']
  • 18:11 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2111']
  • 18:09 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cirrussearch2111']
  • 18:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P74870 and previous config saved to /var/cache/conftool/dbconfig/20250410-180749-fceratto.json
  • 18:07 brennen: 1.44.0-wmf.24 train status (T386219): logs quiet, no current blockers, moving to all wikis
  • 18:00 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2111']
  • 18:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2111.codfw.wmnet with OS bullseye
  • 17:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2111
  • 17:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2111
  • 17:53 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2111.codfw.wmnet with OS bullseye
  • 17:53 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2111.codfw.wmnet with OS bullseye
  • 17:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P74869 and previous config saved to /var/cache/conftool/dbconfig/20250410-175242-fceratto.json
  • 17:45 dancy@deploy1003: Finished scap sync-world: Backport for WikiLambdaApiBase: Add logging for every remaining dieWith?(Z)Error, Set WikiLambdaClientTargetAPI default value to protocol-relative, so HSTS doesn't sting us (T391534), WikifunctionsClientUsageUpdateJob: Don't pass a heavy Title in, just the scalars (T391533) (duration: 13m 28s)
  • 17:39 dancy@deploy1003: dancy, jforrester: Continuing with sync
  • 17:37 dancy@deploy1003: dancy, jforrester: Backport for WikiLambdaApiBase: Add logging for every remaining dieWith?(Z)Error, Set WikiLambdaClientTargetAPI default value to protocol-relative, so HSTS doesn't sting us (T391534), WikifunctionsClientUsageUpdateJob: Don't pass a heavy Title in, just the scalars (T391533) synced to the testservers (https://wikitech.wikimedi
  • 17:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T391056)', diff saved to https://phabricator.wikimedia.org/P74868 and previous config saved to /var/cache/conftool/dbconfig/20250410-173735-fceratto.json
  • 17:35 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2111
  • 17:35 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2111
  • 17:35 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2111.codfw.wmnet with OS bullseye
  • 17:35 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2111.codfw.wmnet on all recursors
  • 17:35 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2111.codfw.wmnet on all recursors
  • 17:33 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T391056)', diff saved to https://phabricator.wikimedia.org/P74867 and previous config saved to /var/cache/conftool/dbconfig/20250410-173339-fceratto.json
  • 17:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 17:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T391056)', diff saved to https://phabricator.wikimedia.org/P74866 and previous config saved to /var/cache/conftool/dbconfig/20250410-173315-fceratto.json
  • 17:33 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2111 to cirrussearch2111
  • 17:32 dancy@deploy1003: Started scap sync-world: Backport for WikiLambdaApiBase: Add logging for every remaining dieWith?(Z)Error, Set WikiLambdaClientTargetAPI default value to protocol-relative, so HSTS doesn't sting us (T391534), WikifunctionsClientUsageUpdateJob: Don't pass a heavy Title in, just the scalars (T391533)
  • 17:32 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2111
  • 17:31 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2111
  • 17:31 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:31 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2111 to cirrussearch2111 - bking@cumin2002"
  • 17:31 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2111 to cirrussearch2111 - bking@cumin2002"
  • 17:25 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 17:24 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2111 to cirrussearch2111
  • 17:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P74865 and previous config saved to /var/cache/conftool/dbconfig/20250410-171808-fceratto.json
  • 17:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P74863 and previous config saved to /var/cache/conftool/dbconfig/20250410-170300-fceratto.json
  • 16:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2075.codfw.wmnet with OS bullseye
  • 16:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T391056)', diff saved to https://phabricator.wikimedia.org/P74862 and previous config saved to /var/cache/conftool/dbconfig/20250410-164753-fceratto.json
  • 16:44 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T391056)', diff saved to https://phabricator.wikimedia.org/P74861 and previous config saved to /var/cache/conftool/dbconfig/20250410-164400-fceratto.json
  • 16:43 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 16:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 16:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 16:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 16:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T391056)', diff saved to https://phabricator.wikimedia.org/P74860 and previous config saved to /var/cache/conftool/dbconfig/20250410-164049-fceratto.json
  • 16:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2075.codfw.wmnet with reason: host reimage
  • 16:34 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2075.codfw.wmnet with reason: host reimage
  • 16:33 jiji@cumin1002: conftool action : set/pooled=yes; selector: name=mwdebug2002.codfw.wmnet
  • 16:30 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwdebug2002.codfw.wmnet with OS bullseye
  • 16:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P74859 and previous config saved to /var/cache/conftool/dbconfig/20250410-162542-fceratto.json
  • 16:18 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2075
  • 16:18 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2075
  • 16:18 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2075
  • 16:18 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2075.codfw.wmnet 145.0.192.10.in-addr.arpa 5.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 16:18 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2075.codfw.wmnet 145.0.192.10.in-addr.arpa 5.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 16:18 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:18 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2075 - bking@cumin2002"
  • 16:18 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2075 - bking@cumin2002"
  • 16:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P74858 and previous config saved to /var/cache/conftool/dbconfig/20250410-161036-fceratto.json
  • 16:06 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:06 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2075
  • 16:06 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2075.codfw.wmnet with OS bullseye
  • 16:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2075 to cirrussearch2075
  • 16:02 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2075
  • 16:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2075
  • 16:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2075 to cirrussearch2075 - bking@cumin2002"
  • 15:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T391056)', diff saved to https://phabricator.wikimedia.org/P74857 and previous config saved to /var/cache/conftool/dbconfig/20250410-155528-fceratto.json
  • 15:54 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on releases2003.codfw.wmnet with reason: Bookworm Re-image
  • 15:52 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T391056)', diff saved to https://phabricator.wikimedia.org/P74856 and previous config saved to /var/cache/conftool/dbconfig/20250410-155241-fceratto.json
  • 15:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 15:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T391056)', diff saved to https://phabricator.wikimedia.org/P74855 and previous config saved to /var/cache/conftool/dbconfig/20250410-155220-fceratto.json
  • 15:52 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwdebug2002.codfw.wmnet with reason: host reimage
  • 15:48 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mwdebug2002.codfw.wmnet with reason: host reimage
  • 15:41 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2075 to cirrussearch2075 - bking@cumin2002"
  • 15:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P74854 and previous config saved to /var/cache/conftool/dbconfig/20250410-153713-fceratto.json
  • 15:29 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mwdebug2002.codfw.wmnet with OS bullseye
  • 15:23 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 15:23 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2075 to cirrussearch2075
  • 15:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P74852 and previous config saved to /var/cache/conftool/dbconfig/20250410-152206-fceratto.json
  • 15:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2074.codfw.wmnet with OS bullseye
  • 15:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T391056)', diff saved to https://phabricator.wikimedia.org/P74851 and previous config saved to /var/cache/conftool/dbconfig/20250410-150658-fceratto.json
  • 15:04 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T391056)', diff saved to https://phabricator.wikimedia.org/P74850 and previous config saved to /var/cache/conftool/dbconfig/20250410-150431-fceratto.json
  • 15:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 15:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T391056)', diff saved to https://phabricator.wikimedia.org/P74849 and previous config saved to /var/cache/conftool/dbconfig/20250410-150407-fceratto.json
  • 14:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2074.codfw.wmnet with reason: host reimage
  • 14:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P74848 and previous config saved to /var/cache/conftool/dbconfig/20250410-144900-fceratto.json
  • 14:47 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2074.codfw.wmnet with reason: host reimage
  • 14:37 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 14:36 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 14:36 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 14:36 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 14:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P74847 and previous config saved to /var/cache/conftool/dbconfig/20250410-143352-fceratto.json
  • 14:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2074
  • 14:31 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2074
  • 14:28 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2074
  • 14:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2074.codfw.wmnet 138.0.192.10.in-addr.arpa 8.3.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:28 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2074.codfw.wmnet 138.0.192.10.in-addr.arpa 8.3.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:28 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2074 - bking@cumin2002"
  • 14:28 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2074 - bking@cumin2002"
  • 14:24 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:23 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2074
  • 14:23 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2074.codfw.wmnet with OS bullseye
  • 14:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2074 to cirrussearch2074
  • 14:21 godog: stop curator_actions_cluster_wide.service on logging-sd1001 - forcemerge causing kafka lag
  • 14:21 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2074
  • 14:21 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2074
  • 14:21 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:21 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2074 to cirrussearch2074 - bking@cumin2002"
  • 14:20 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2074 to cirrussearch2074 - bking@cumin2002"
  • 14:20 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc1002.eqiad.wmnet
  • 14:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T391056)', diff saved to https://phabricator.wikimedia.org/P74845 and previous config saved to /var/cache/conftool/dbconfig/20250410-141845-fceratto.json
  • 14:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142']
  • 14:16 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T391056)', diff saved to https://phabricator.wikimedia.org/P74844 and previous config saved to /var/cache/conftool/dbconfig/20250410-141619-fceratto.json
  • 14:16 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 14:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T391056)', diff saved to https://phabricator.wikimedia.org/P74843 and previous config saved to /var/cache/conftool/dbconfig/20250410-141608-fceratto.json
  • 14:15 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2074 to cirrussearch2074
  • 14:14 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-misc1002.eqiad.wmnet
  • 14:14 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc1001.eqiad.wmnet
  • 14:09 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-misc1001.eqiad.wmnet
  • 14:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P74842 and previous config saved to /var/cache/conftool/dbconfig/20250410-140100-fceratto.json
  • 13:56 jiji@cumin1002: conftool action : set/pooled=inactive; selector: name=mwdebug2002.codfw.wmnet
  • 13:55 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 13:55 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 13:55 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 13:54 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 13:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142']
  • 13:51 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142']
  • 13:49 jiji@cumin1002: conftool action : set/pooled=yes; selector: name=mwdebug1002.eqiad.wmnet
  • 13:46 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1005.eqiad.wmnet
  • 13:46 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:46 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 13:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P74841 and previous config saved to /var/cache/conftool/dbconfig/20250410-134553-fceratto.json
  • 13:44 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:44 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 13:39 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 13:37 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for AX: Enable Quick Surveys extension on Asturian and Lombard wiki (T390023), AX: Enable entry-points on Asturian and Lombard wiki (T390023) (duration: 15m 42s)
  • 13:34 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1135431 to enable haproxy requestctl rules everywhere (T370745)
  • 13:34 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol1005.eqiad.wmnet
  • 13:31 lucaswerkmeister-wmde@deploy1003: abi, lucaswerkmeister-wmde: Continuing with sync
  • 13:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T391056)', diff saved to https://phabricator.wikimedia.org/P74840 and previous config saved to /var/cache/conftool/dbconfig/20250410-133046-fceratto.json
  • 13:28 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwdebug1002.eqiad.wmnet with OS bullseye
  • 13:27 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1183 (T391056)', diff saved to https://phabricator.wikimedia.org/P74839 and previous config saved to /var/cache/conftool/dbconfig/20250410-132756-fceratto.json
  • 13:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 13:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T391056)', diff saved to https://phabricator.wikimedia.org/P74838 and previous config saved to /var/cache/conftool/dbconfig/20250410-132744-fceratto.json
  • 13:27 lucaswerkmeister-wmde@deploy1003: abi, lucaswerkmeister-wmde: Backport for AX: Enable Quick Surveys extension on Asturian and Lombard wiki (T390023), AX: Enable entry-points on Asturian and Lombard wiki (T390023) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:26 tappof: expand LVs on prometheus instances (k8s-dse)
  • 13:22 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for AX: Enable Quick Surveys extension on Asturian and Lombard wiki (T390023), AX: Enable entry-points on Asturian and Lombard wiki (T390023)
  • 13:20 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1169.eqiad.wmnet with OS bullseye
  • 13:13 klausman@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 13:12 klausman@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 13:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P74837 and previous config saved to /var/cache/conftool/dbconfig/20250410-131237-fceratto.json
  • 12:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P74836 and previous config saved to /var/cache/conftool/dbconfig/20250410-125729-fceratto.json
  • 12:56 klausman@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 12:56 klausman@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 12:52 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwdebug1002.eqiad.wmnet with reason: host reimage
  • 12:51 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf2002.codfw.wmnet
  • 12:48 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mwdebug1002.eqiad.wmnet with reason: host reimage
  • 12:45 reedy@deploy1003: Synchronized wmf-config/interwiki-labs.php: Update! (duration: 14m 07s)
  • 12:43 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-wf2002.codfw.wmnet
  • 12:43 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf2001.codfw.wmnet
  • 12:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T391056)', diff saved to https://phabricator.wikimedia.org/P74834 and previous config saved to /var/cache/conftool/dbconfig/20250410-124222-fceratto.json
  • 12:39 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T391056)', diff saved to https://phabricator.wikimedia.org/P74833 and previous config saved to /var/cache/conftool/dbconfig/20250410-123931-fceratto.json
  • 12:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 12:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T391056)', diff saved to https://phabricator.wikimedia.org/P74832 and previous config saved to /var/cache/conftool/dbconfig/20250410-123850-fceratto.json
  • 12:37 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-wf2001.codfw.wmnet
  • 12:36 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf1002.eqiad.wmnet
  • 12:29 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-wf1002.eqiad.wmnet
  • 12:29 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf1001.eqiad.wmnet
  • 12:28 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mwdebug1002.eqiad.wmnet with OS bullseye
  • 12:26 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2055.codfw.wmnet
  • 12:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P74831 and previous config saved to /var/cache/conftool/dbconfig/20250410-122343-fceratto.json
  • 12:22 btullis@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1169
  • 12:22 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-wf1001.eqiad.wmnet
  • 12:22 btullis@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1169
  • 12:21 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:21 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1169 - btullis@cumin1002"
  • 12:20 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1169 - btullis@cumin1002"
  • 12:20 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2055.codfw.wmnet
  • 12:20 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 12:19 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 12:18 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 12:17 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 12:15 cgoubert@deploy1003: Finished scap sync-world: Rebuilding mediawiki images to pick up new base images 1135694 - T387208 (duration: 44m 51s)
  • 12:14 btullis@cumin1002: START - Cookbook sre.dns.netbox
  • 12:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P74829 and previous config saved to /var/cache/conftool/dbconfig/20250410-120835-fceratto.json
  • 12:08 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 12:06 btullis@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1169
  • 12:06 btullis@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1169
  • 12:06 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:03 btullis@cumin1002: START - Cookbook sre.dns.netbox
  • 12:03 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 12:01 btullis@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1169
  • 12:01 btullis@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1169
  • 12:00 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 11:59 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 11:59 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
  • 11:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
  • 11:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 11:57 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 11:57 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 11:57 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2054.codfw.wmnet
  • 11:57 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1054.eqiad.wmnet
  • 11:56 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 11:56 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 11:56 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 11:55 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:55 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
  • 11:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
  • 11:53 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 11:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T391056)', diff saved to https://phabricator.wikimedia.org/P74828 and previous config saved to /var/cache/conftool/dbconfig/20250410-115328-fceratto.json
  • 11:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 11:50 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1159 (T391056)', diff saved to https://phabricator.wikimedia.org/P74827 and previous config saved to /var/cache/conftool/dbconfig/20250410-115037-fceratto.json
  • 11:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2054.codfw.wmnet
  • 11:50 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1159.eqiad.wmnet with reason: Maintenance
  • 11:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1054.eqiad.wmnet
  • 11:32 cgoubert@deploy1003: Started scap sync-world: Rebuilding mediawiki images to pick up new base images 1135694 - T387208
  • 11:28 claime: Rebuilding php base images to pick up 1135694 - T387208
  • 11:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 85% (T360589) (duration: 16m 20s)
  • 11:17 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 11:15 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 85% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:10 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 85% (T360589)
  • 11:08 cgoubert@deploy1003: Finished scap sync-world: Backport for MWScript.php: exit code on mesh, longer timeout (T390972 T387208) (duration: 22m 15s)
  • 10:55 cgoubert@deploy1003: cgoubert: Continuing with sync
  • 10:54 cgoubert@deploy1003: cgoubert: Backport for MWScript.php: exit code on mesh, longer timeout (T390972 T387208) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:45 cgoubert@deploy1003: Started scap sync-world: Backport for MWScript.php: exit code on mesh, longer timeout (T390972 T387208)
  • 10:28 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: sync
  • 10:28 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: sync
  • 10:26 elukey: rest-gateway from now on calls citoid on its ingress endpoint
  • 10:23 phedenskog@deploy1003: Finished deploy [performance/navtiming@94fa387]: Disable navtiming performance metrics in Graphite (duration: 00m 50s)
  • 10:23 phedenskog@deploy1003: Started deploy [performance/navtiming@94fa387]: Disable navtiming performance metrics in Graphite
  • 10:21 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: sync
  • 10:20 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: sync
  • 10:19 cgoubert@deploy1003: Started scap sync-world: Rebuilding mediawiki images to pick up new base images 1135379 - T387208
  • 10:19 cgoubert@deploy1003: sync-world aborted: Rebuilding mediawiki images to pick up new base images 1135379 - T387208 (duration: 35m 23s)
  • 09:55 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 09:55 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 09:55 fabfur@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading A:liberica
  • 09:50 fabfur@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading A:liberica
  • 09:45 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:45 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:44 cgoubert@deploy1003: Started scap sync-world: Rebuilding mediawiki images to pick up new base images 1135379 - T387208
  • 09:40 claime: Rebuilding php base images to pick up 1135379 - T387208
  • 09:39 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: sync
  • 09:38 elukey@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: sync
  • 09:32 topranks: decom 2x10G lag from cloudsw1-c8-eqiad to asw2-b-eqiad T391489
  • 09:24 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2053.codfw.wmnet
  • 09:23 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1053.eqiad.wmnet
  • 09:17 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1053.eqiad.wmnet
  • 09:17 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2053.codfw.wmnet
  • 09:13 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica-esams and A:liberica
  • 09:11 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica-esams and A:liberica
  • 09:11 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica-drmrs and A:liberica
  • 09:08 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica-drmrs and A:liberica
  • 09:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica-eqsin and A:liberica
  • 09:03 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica-eqsin and A:liberica
  • 09:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica-ulsfo and A:liberica
  • 09:00 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica-ulsfo and A:liberica
  • 08:59 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica-magru and not P{lvs7003.magru.wmnet} and A:liberica
  • 08:57 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica-magru and not P{lvs7003.magru.wmnet} and A:liberica
  • 08:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing P{lvs7003.magru.wmnet} and A:liberica
  • 08:53 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing P{lvs7003.magru.wmnet} and A:liberica
  • 08:41 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: sync
  • 08:41 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: sync
  • 08:40 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: sync
  • 08:40 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: sync
  • 08:40 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: sync
  • 08:40 elukey@deploy1003: helmfile [staging] START helmfile.d/services/citoid: sync
  • 08:22 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica-canary
  • 08:22 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica-canary
  • 08:20 vgutierrez: upload liberica 0.13 to bookworm-wikimedia (apt.wm.o)
  • 08:18 elukey@dns1004: END - running authdns-update
  • 08:16 elukey@dns1004: START - running authdns-update
  • 08:02 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.upgrade (exit_code=1) restarting A:liberica-canary
  • 08:01 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting A:liberica-canary
  • 07:56 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica-canary
  • 07:56 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica-canary
  • 07:47 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting A:liberica-canary
  • 07:47 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling A:liberica-canary
  • 07:47 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin pooling A:liberica-canary
  • 07:47 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling A:liberica-canary
  • 07:47 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin depooling A:liberica-canary
  • 07:46 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting A:liberica-canary
  • 07:44 vgutierrez: rollback to liberica 0.11 in lvs1013
  • 07:40 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica-canary
  • 07:39 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica-canary
  • 07:35 vgutierrez: upload liberica 0.12 to bookworm-wikimedia (apt.wm.o)
  • 07:21 marostegui@cumin1002: dbctl commit (dc=all): 'Add db1180 to s6 vslow/dump', diff saved to https://phabricator.wikimedia.org/P74824 and previous config saved to /var/cache/conftool/dbconfig/20250410-072127-marostegui.json
  • 06:55 marostegui: Migrate pc2 to MariaDB 10.11 T391454
  • 06:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc2 T391454', diff saved to https://phabricator.wikimedia.org/P74823 and previous config saved to /var/cache/conftool/dbconfig/20250410-065208-marostegui.json
  • 06:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2012.codfw.wmnet,pc1012.eqiad.wmnet with reason: Maintenance
  • 06:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc2 T391454', diff saved to https://phabricator.wikimedia.org/P74822 and previous config saved to /var/cache/conftool/dbconfig/20250410-064511-marostegui.json

2025-04-09

  • 22:53 mutante: apt-staging2001 - sudo systemctl start gitlab-package-puller to fix monitoring alert
  • 22:47 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: security release
  • 22:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:09 jclark@cumin1002: START - Cookbook sre.hosts.provision for host druid1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:08 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:08 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for druid1012/1013 - jclark@cumin1002"
  • 22:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for druid1012/1013 - jclark@cumin1002"
  • 22:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host druid1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:04 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:40 jforrester@deploy1003: Finished scap sync-world: Backport for [test2wiki] Enable Wikifunctions client mode (T383106), MWMultiVersion: Recognise the new wikifunctionsclient dblist (duration: 18m 01s)
  • 21:34 jforrester@deploy1003: jforrester: Continuing with sync
  • 21:29 jforrester@deploy1003: jforrester: Backport for [test2wiki] Enable Wikifunctions client mode (T383106), MWMultiVersion: Recognise the new wikifunctionsclient dblist synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
  • 21:22 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
  • 21:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:22 jforrester@deploy1003: Started scap sync-world: Backport for [test2wiki] Enable Wikifunctions client mode (T383106), MWMultiVersion: Recognise the new wikifunctionsclient dblist
  • 21:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-druid1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-druid1006
  • 21:20 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-druid1006
  • 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-druid1007
  • 21:20 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-druid1007
  • 21:19 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:19 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for an-druid - jclark@cumin1002"
  • 21:19 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for an-druid - jclark@cumin1002"
  • 21:19 jforrester@deploy1003: Sync cancelled.
  • 21:15 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:15 jforrester@deploy1003: jforrester: Backport for [test2wiki] Enable Wikifunctions client mode (T383106) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:11 ejegg: fundraising civicrm upgraded from b20436a2 to 38a7a649
  • 21:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-druid1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-druid1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:08 jforrester@deploy1003: Started scap sync-world: Backport for [test2wiki] Enable Wikifunctions client mode (T383106)
  • 19:37 dancy@deploy1003: Installation of scap version "4.153.0" completed for 2 hosts
  • 19:35 dancy@deploy1003: Installing scap version "4.153.0" for 2 host(s)
  • 19:24 fab@deploy1003: Finished deploy [airflow-dags/research@ea5f3de]: (no justification provided) (duration: 00m 41s)
  • 19:24 fab@deploy1003: Started deploy [airflow-dags/research@ea5f3de]: (no justification provided)
  • 19:14 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release
  • 18:48 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: security release
  • 18:40 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release
  • 18:20 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.24 refs T386219
  • 18:06 brennen: 1.44.0-wmf.24 train status (T386219): logs quiet, no current blockers, moving to group1
  • 18:04 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 9 hosts with reason: adding net-new role
  • 17:50 swfrench@deploy1003: Finished scap sync-world: Test scap run after switching to PHP 8.1 container image for maintenance scripts - T390225 (duration: 03m 10s)
  • 17:47 swfrench@deploy1003: Started scap sync-world: Test scap run after switching to PHP 8.1 container image for maintenance scripts - T390225
  • 17:46 swfrench@deploy1003: Stopping before sync operations
  • 17:45 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: security release
  • 17:45 swfrench@deploy1003: Started scap sync-world: Test stop-before-sync scap run after switching to PHP 8.1 container image for maintenance scripts - T390225
  • 17:38 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
  • 17:34 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
  • 17:33 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
  • 17:21 ladsgroup@deploy1003: Finished scap sync-world: Backport for Increase max db connection count before circuit breaking (T390510) (duration: 16m 47s)
  • 17:19 mutante: apt1002 - updating thirdparty/gitlab-bullseye gitlab-ce package version
  • 17:12 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 17:11 ladsgroup@deploy1003: ladsgroup: Backport for Increase max db connection count before circuit breaking (T390510) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:04 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
  • 17:04 sukhe: forcing rechecks for pc1011 and db1151
  • 17:04 ladsgroup@deploy1003: Started scap sync-world: Backport for Increase max db connection count before circuit breaking (T390510)
  • 17:04 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
  • 17:01 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
  • 17:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2109.codfw.wmnet on all recursors
  • 17:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2109.codfw.wmnet on all recursors
  • 17:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2085.codfw.wmnet on all recursors
  • 17:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2085.codfw.wmnet on all recursors
  • 16:59 sukhe: [END] sudo cumin -b11 "O:mariadb::core" "run-puppet-agent"
  • 16:46 sukhe: sudo cumin -b11 "O:mariadb::core" "run-puppet-agent"
  • 16:44 sukhe: forcing puppet run on db2229
  • 16:38 sukhe: merging above change: CR 1135471
  • 16:17 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2088.codfw.wmnet with reason: host reimage
  • 16:10 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2068.codfw.wmnet with reason: host reimage
  • 15:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:55 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2068.codfw.wmnet with reason: host reimage
  • 15:50 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2088
  • 15:50 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2088
  • 15:50 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2088.codfw.wmnet with OS bullseye
  • 15:48 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cirrussearch2088.codfw.wmnet']
  • 15:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:38 sukhe: reprepro -C component/nginx-ech include bookworm-wikimedia openssl_3.4.1-1+ech1_amd64.changes: T205378
  • 15:38 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2088.codfw.wmnet']
  • 15:35 jforrester@deploy1003: Started scap sync-world: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names, Add wikifunctionsclient dblist for production wikis that allow embedding Wikifunctions calls
  • 15:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:26 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cirrussearch2088.codfw.wmnet']
  • 15:21 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:20 elukey: restart docker on deploy1003 to revert the push serialization change - T390251
  • 15:16 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2088.codfw.wmnet']
  • 15:11 vgutierrez: upgrading to varnish 7.1.1-1.1~bpo11+wmf3 in cp3073 (text) and cp3081 (upload) - T391334
  • 15:10 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2068
  • 15:07 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2068
  • 15:07 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2068
  • 15:07 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2068.codfw.wmnet 102.48.192.10.in-addr.arpa 2.0.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:07 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2068.codfw.wmnet 102.48.192.10.in-addr.arpa 2.0.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:07 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:07 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2068 - bking@cumin2002"
  • 15:07 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2068 - bking@cumin2002"
  • 15:06 jforrester@deploy1003: sync-world aborted: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names, Add wikifunctionsclient dblist for production wikis that allow embedding Wikifunctions calls (duration: 17m 08s)
  • 14:57 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:57 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2068
  • 14:56 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2068.codfw.wmnet with OS bullseye
  • 14:56 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2068.codfw.wmnet on all recursors
  • 14:55 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2068.codfw.wmnet on all recursors
  • 14:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2068 to cirrussearch2068
  • 14:55 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2068
  • 14:55 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2068
  • 14:55 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:55 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2068 to cirrussearch2068 - bking@cumin2002"
  • 14:54 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2068 to cirrussearch2068 - bking@cumin2002"
  • 14:49 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:49 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2068 to cirrussearch2068
  • 14:49 jforrester@deploy1003: Started scap sync-world: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names, Add wikifunctionsclient dblist for production wikis that allow embedding Wikifunctions calls
  • 14:47 elukey: restart docker on deploy1003
  • 14:47 jforrester@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.44.0-wmf.23,1.44.0-wmf.24 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.disco
  • 14:42 jforrester@deploy1003: Started scap sync-world: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names, Add wikifunctionsclient dblist for production wikis that allow embedding Wikifunctions calls
  • 14:41 jforrester@deploy1003: sync-world aborted: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names (duration: 06m 26s)
  • 14:36 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cirrussearch2088.codfw.wmnet']
  • 14:34 jforrester@deploy1003: Started scap sync-world: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names
  • 14:33 jforrester@deploy1003: sync-world aborted: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names (duration: 04m 52s)
  • 14:32 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
  • 14:30 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2088.codfw.wmnet']
  • 14:29 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2088.codfw.wmnet with OS bullseye
  • 14:29 jforrester@deploy1003: Started scap sync-world: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names
  • 14:28 jforrester@deploy1003: sync-world aborted: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names (duration: 08m 48s)
  • 14:19 jforrester@deploy1003: Started scap sync-world: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names
  • 14:18 jforrester@deploy1003: sync-world aborted: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names (duration: 07m 47s)
  • 14:11 jforrester@deploy1003: Started scap sync-world: Backport for Move to new async Parsoid fragment provision (T373253 T388546), Switch out various old PHP aliases to the current class names
  • 14:09 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:09 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:08 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:08 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:08 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:07 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:04 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:04 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:03 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2088
  • 13:38 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2088
  • 13:38 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2088.codfw.wmnet with OS bullseye
  • 13:23 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwdebug1002.eqiad.wmnet with OS bullseye
  • 13:23 samtar@deploy1003: Finished scap sync-world: Backport for madwiktionary: add logo, icon, wordmark and tagline (T391318), arywiki: enable wgMinervaEnableSiteNotice (duration: 16m 14s)
  • 13:17 samtar@deploy1003: samtar, anzx: Continuing with sync
  • 13:15 samtar@deploy1003: samtar, anzx: Backport for madwiktionary: add logo, icon, wordmark and tagline (T391318), arywiki: enable wgMinervaEnableSiteNotice synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:07 samtar@deploy1003: Started scap sync-world: Backport for madwiktionary: add logo, icon, wordmark and tagline (T391318), arywiki: enable wgMinervaEnableSiteNotice
  • 13:00 awight: special window completed
  • 12:47 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwdebug1002.eqiad.wmnet with reason: host reimage
  • 12:43 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mwdebug1002.eqiad.wmnet with reason: host reimage
  • 12:27 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mwdebug1002.eqiad.wmnet with OS bullseye
  • 12:10 effie: mwdebug1002 has been depooled and removed from scap dsh
  • 12:09 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2052.codfw.wmnet
  • 12:09 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1052.eqiad.wmnet
  • 12:06 effie: prepping mwdebug1002 for reimage
  • 11:41 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:41 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:38 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:37 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:32 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:30 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:29 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:28 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1407.eqiad.wmnet
  • 11:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw1407.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - hnowlan@cumin1002"
  • 11:21 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw1407.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - hnowlan@cumin1002"
  • 11:20 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 11:19 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 11:19 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 11:18 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 11:17 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 11:17 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 11:16 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 11:14 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2051.codfw.wmnet
  • 11:14 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1051.eqiad.wmnet
  • 11:10 hnowlan@cumin1002: START - Cookbook sre.hosts.decommission for hosts mw1407.eqiad.wmnet
  • 11:07 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1051.eqiad.wmnet
  • 11:07 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2051.codfw.wmnet
  • 11:06 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2050.codfw.wmnet
  • 11:05 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1050.eqiad.wmnet
  • 11:04 jiji@cumin1002: conftool action : set/pooled=inactive; selector: name=mwdebug1002.eqiad.wmnet
  • 11:01 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2278.codfw.wmnet
  • 11:01 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:00 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2050.codfw.wmnet
  • 10:59 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1050.eqiad.wmnet
  • 10:59 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 10:59 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1349-1351].eqiad.wmnet
  • 10:59 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:59 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1349-1351].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - hnowlan@cumin1002"
  • 10:59 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1349-1351].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - hnowlan@cumin1002"
  • 10:54 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 10:53 hnowlan@cumin1002: START - Cookbook sre.hosts.decommission for hosts mw2278.codfw.wmnet
  • 10:50 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw[2278-2279].codfw.wmnet
  • 10:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2278-2279].codfw.wmnet decommissioned, removing all IPs except the asset tag one - hnowlan@cumin1002"
  • 10:49 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2278-2279].codfw.wmnet decommissioned, removing all IPs except the asset tag one - hnowlan@cumin1002"
  • 10:42 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 10:41 hnowlan@cumin1002: START - Cookbook sre.hosts.decommission for hosts mw[1349-1351].eqiad.wmnet
  • 10:37 hnowlan@cumin1002: START - Cookbook sre.hosts.decommission for hosts mw[2278-2279].codfw.wmnet
  • 10:23 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: sync
  • 10:22 elukey@deploy1003: helmfile [staging] START helmfile.d/services/citoid: sync
  • 10:19 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 10:18 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 80% (T360589) (duration: 14m 19s)
  • 10:18 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 10:18 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 10:18 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 10:12 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2049.codfw.wmnet
  • 10:12 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 10:12 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 80% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:11 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1049.eqiad.wmnet
  • 10:05 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1049.eqiad.wmnet
  • 10:05 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2049.codfw.wmnet
  • 10:04 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 80% (T360589)
  • 09:37 cgoubert@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wikikube-worker2142.codfw.wmnet with reason: Hardware failure
  • 09:36 cgoubert@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) check for host wikikube-worker2142.codfw.wmnet
  • 09:36 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-worker2142.codfw.wmnet
  • 09:32 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 09:32 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 09:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 09:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 09:18 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: sync
  • 09:18 elukey@deploy1003: helmfile [staging] START helmfile.d/services/citoid: sync
  • 09:05 elukey: rollout security upgrades for ghostscript
  • 08:54 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 08:54 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 08:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2243 (T391056)', diff saved to https://phabricator.wikimedia.org/P74814 and previous config saved to /var/cache/conftool/dbconfig/20250409-085347-fceratto.json
  • 08:50 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 08:49 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 08:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2243', diff saved to https://phabricator.wikimedia.org/P74813 and previous config saved to /var/cache/conftool/dbconfig/20250409-083840-fceratto.json
  • 08:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2243', diff saved to https://phabricator.wikimedia.org/P74812 and previous config saved to /var/cache/conftool/dbconfig/20250409-082333-fceratto.json
  • 08:09 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
  • 08:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2090.codfw.wmnet with OS bullseye
  • 08:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2243 (T391056)', diff saved to https://phabricator.wikimedia.org/P74811 and previous config saved to /var/cache/conftool/dbconfig/20250409-080826-fceratto.json
  • 07:58 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2243 (T391056)', diff saved to https://phabricator.wikimedia.org/P74810 and previous config saved to /var/cache/conftool/dbconfig/20250409-075815-fceratto.json
  • 07:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2243.codfw.wmnet with reason: Maintenance
  • 07:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2242.codfw.wmnet with reason: Maintenance
  • 07:41 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2090.codfw.wmnet with reason: host reimage
  • 07:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2241.codfw.wmnet with reason: Maintenance
  • 07:37 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2090.codfw.wmnet with reason: host reimage
  • 07:35 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:34 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 07:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 07:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T391056)', diff saved to https://phabricator.wikimedia.org/P74809 and previous config saved to /var/cache/conftool/dbconfig/20250409-072240-fceratto.json
  • 07:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2090
  • 07:19 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2090
  • 07:18 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2090
  • 07:18 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2090.codfw.wmnet 97.0.192.10.in-addr.arpa 7.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 07:17 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2090.codfw.wmnet 97.0.192.10.in-addr.arpa 7.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 07:17 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:17 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2090 - bking@cumin2002"
  • 07:17 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2090 - bking@cumin2002"
  • 07:09 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 07:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P74808 and previous config saved to /var/cache/conftool/dbconfig/20250409-070733-fceratto.json
  • 07:05 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2090
  • 07:05 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2090.codfw.wmnet with OS bullseye
  • 06:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P74807 and previous config saved to /var/cache/conftool/dbconfig/20250409-065225-fceratto.json
  • 06:47 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2090.codfw.wmnet on all recursors
  • 06:47 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2090.codfw.wmnet on all recursors
  • 06:47 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2090 to cirrussearch2090
  • 06:46 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2090
  • 06:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T391056)', diff saved to https://phabricator.wikimedia.org/P74806 and previous config saved to /var/cache/conftool/dbconfig/20250409-063718-fceratto.json
  • 06:25 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T391056)', diff saved to https://phabricator.wikimedia.org/P74805 and previous config saved to /var/cache/conftool/dbconfig/20250409-062542-fceratto.json
  • 06:25 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 06:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T391056)', diff saved to https://phabricator.wikimedia.org/P74804 and previous config saved to /var/cache/conftool/dbconfig/20250409-062519-fceratto.json
  • 06:20 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2090
  • 06:20 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:20 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2090 to cirrussearch2090 - bking@cumin2002"
  • 06:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P74803 and previous config saved to /var/cache/conftool/dbconfig/20250409-061012-fceratto.json
  • 05:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repool ms1 T391317', diff saved to https://phabricator.wikimedia.org/P74802 and previous config saved to /var/cache/conftool/dbconfig/20250409-055903-marostegui.json
  • 05:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P74801 and previous config saved to /var/cache/conftool/dbconfig/20250409-055504-fceratto.json
  • 05:50 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2142.codfw.wmnet,db1152.eqiad.wmnet with reason: Maintenance
  • 05:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depool ms1 T391317', diff saved to https://phabricator.wikimedia.org/P74800 and previous config saved to /var/cache/conftool/dbconfig/20250409-055028-marostegui.json
  • 05:49 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2090 to cirrussearch2090 - bking@cumin2002"
  • 05:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T391056)', diff saved to https://phabricator.wikimedia.org/P74799 and previous config saved to /var/cache/conftool/dbconfig/20250409-053957-fceratto.json
  • 05:27 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T391056)', diff saved to https://phabricator.wikimedia.org/P74798 and previous config saved to /var/cache/conftool/dbconfig/20250409-052719-fceratto.json
  • 05:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 05:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T391056)', diff saved to https://phabricator.wikimedia.org/P74797 and previous config saved to /var/cache/conftool/dbconfig/20250409-052656-fceratto.json
  • 05:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P74796 and previous config saved to /var/cache/conftool/dbconfig/20250409-051149-fceratto.json
  • 05:05 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 05:05 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2090 to cirrussearch2090
  • 05:01 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
  • 04:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P74795 and previous config saved to /var/cache/conftool/dbconfig/20250409-045642-fceratto.json
  • 04:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T391056)', diff saved to https://phabricator.wikimedia.org/P74794 and previous config saved to /var/cache/conftool/dbconfig/20250409-044134-fceratto.json
  • 04:28 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T391056)', diff saved to https://phabricator.wikimedia.org/P74793 and previous config saved to /var/cache/conftool/dbconfig/20250409-042846-fceratto.json
  • 04:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 04:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T391056)', diff saved to https://phabricator.wikimedia.org/P74792 and previous config saved to /var/cache/conftool/dbconfig/20250409-042824-fceratto.json
  • 04:17 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
  • 04:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2089.codfw.wmnet with OS bullseye
  • 04:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P74791 and previous config saved to /var/cache/conftool/dbconfig/20250409-041317-fceratto.json
  • 03:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P74790 and previous config saved to /var/cache/conftool/dbconfig/20250409-035810-fceratto.json
  • 03:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2089.codfw.wmnet with reason: host reimage
  • 03:48 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2089.codfw.wmnet with reason: host reimage
  • 03:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T391056)', diff saved to https://phabricator.wikimedia.org/P74789 and previous config saved to /var/cache/conftool/dbconfig/20250409-034302-fceratto.json
  • 03:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2089
  • 03:30 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2089
  • 03:30 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T391056)', diff saved to https://phabricator.wikimedia.org/P74788 and previous config saved to /var/cache/conftool/dbconfig/20250409-033025-fceratto.json
  • 03:30 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 03:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T391056)', diff saved to https://phabricator.wikimedia.org/P74787 and previous config saved to /var/cache/conftool/dbconfig/20250409-033001-fceratto.json
  • 03:26 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2089
  • 03:26 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2089.codfw.wmnet 92.0.192.10.in-addr.arpa 2.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 03:26 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2089.codfw.wmnet 92.0.192.10.in-addr.arpa 2.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 03:26 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 03:25 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2089 - bking@cumin2002"
  • 03:25 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2089 - bking@cumin2002"
  • 03:20 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 03:20 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2089
  • 03:20 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2089.codfw.wmnet with OS bullseye
  • 03:18 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2089.codfw.wmnet on all recursors
  • 03:18 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2089.codfw.wmnet on all recursors
  • 03:18 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2089 to cirrussearch2089
  • 03:18 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2089
  • 03:17 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2089
  • 03:17 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 03:17 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2089 to cirrussearch2089 - bking@cumin2002"
  • 03:15 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2089 to cirrussearch2089 - bking@cumin2002"
  • 03:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74786 and previous config saved to /var/cache/conftool/dbconfig/20250409-031453-fceratto.json
  • 03:09 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 03:09 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2089 to cirrussearch2089
  • 03:08 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2088.codfw.wmnet with OS bullseye
  • 03:08 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2045.codfw.wmnet with OS bookworm
  • 03:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 03:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 03:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 03:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 03:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 03:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 03:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 03:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 03:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 03:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 03:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 03:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 02:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74785 and previous config saved to /var/cache/conftool/dbconfig/20250409-025946-fceratto.json
  • 02:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 02:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 02:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 02:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 02:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 02:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 02:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 02:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 02:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 02:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 02:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 02:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T391056)', diff saved to https://phabricator.wikimedia.org/P74784 and previous config saved to /var/cache/conftool/dbconfig/20250409-024439-fceratto.json
  • 02:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 02:34 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2050
  • 02:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2050
  • 02:33 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2049
  • 02:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2049
  • 02:33 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2048
  • 02:33 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2047
  • 02:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2048
  • 02:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2047
  • 02:32 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti servers to codfw - jhancock@cumin2002"
  • 02:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti servers to codfw - jhancock@cumin2002"
  • 02:31 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T391056)', diff saved to https://phabricator.wikimedia.org/P74783 and previous config saved to /var/cache/conftool/dbconfig/20250409-023156-fceratto.json
  • 02:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 02:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T391056)', diff saved to https://phabricator.wikimedia.org/P74782 and previous config saved to /var/cache/conftool/dbconfig/20250409-023134-fceratto.json
  • 02:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 02:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 02:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 02:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P74781 and previous config saved to /var/cache/conftool/dbconfig/20250409-021626-fceratto.json
  • 02:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P74780 and previous config saved to /var/cache/conftool/dbconfig/20250409-020119-fceratto.json
  • 01:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T391056)', diff saved to https://phabricator.wikimedia.org/P74779 and previous config saved to /var/cache/conftool/dbconfig/20250409-014612-fceratto.json
  • 01:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2088
  • 01:33 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2088
  • 01:33 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2088
  • 01:33 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2088.codfw.wmnet 91.0.192.10.in-addr.arpa 1.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 01:33 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2088.codfw.wmnet 91.0.192.10.in-addr.arpa 1.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 01:33 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:33 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2088 - bking@cumin2002"
  • 01:33 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2088 - bking@cumin2002"
  • 01:33 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T391056)', diff saved to https://phabricator.wikimedia.org/P74778 and previous config saved to /var/cache/conftool/dbconfig/20250409-013316-fceratto.json
  • 01:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 01:32 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 01:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T391056)', diff saved to https://phabricator.wikimedia.org/P74777 and previous config saved to /var/cache/conftool/dbconfig/20250409-013238-fceratto.json
  • 01:24 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 01:24 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2088
  • 01:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2088.codfw.wmnet with OS bullseye
  • 01:23 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2088.codfw.wmnet on all recursors
  • 01:23 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2088.codfw.wmnet on all recursors
  • 01:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2088 to cirrussearch2088
  • 01:22 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2088
  • 01:21 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2088
  • 01:21 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:21 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2088 to cirrussearch2088 - bking@cumin2002"
  • 01:20 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2088 to cirrussearch2088 - bking@cumin2002"
  • 01:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P74776 and previous config saved to /var/cache/conftool/dbconfig/20250409-011731-fceratto.json
  • 01:15 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 01:15 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2088 to cirrussearch2088
  • 01:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2062.codfw.wmnet with OS bullseye
  • 01:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P74775 and previous config saved to /var/cache/conftool/dbconfig/20250409-010224-fceratto.json
  • 00:49 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2062.codfw.wmnet with reason: host reimage
  • 00:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T391056)', diff saved to https://phabricator.wikimedia.org/P74774 and previous config saved to /var/cache/conftool/dbconfig/20250409-004717-fceratto.json
  • 00:46 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2062.codfw.wmnet with reason: host reimage
  • 00:34 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T391056)', diff saved to https://phabricator.wikimedia.org/P74773 and previous config saved to /var/cache/conftool/dbconfig/20250409-003434-fceratto.json
  • 00:34 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 00:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T391056)', diff saved to https://phabricator.wikimedia.org/P74772 and previous config saved to /var/cache/conftool/dbconfig/20250409-003412-fceratto.json
  • 00:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2062
  • 00:29 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2062
  • 00:29 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2062
  • 00:29 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2062.codfw.wmnet 144.0.192.10.in-addr.arpa 4.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 00:29 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2062.codfw.wmnet 144.0.192.10.in-addr.arpa 4.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 00:29 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:29 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2062 - bking@cumin2002"
  • 00:29 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2062 - bking@cumin2002"
  • 00:25 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 00:25 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2062
  • 00:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2062.codfw.wmnet with OS bullseye
  • 00:24 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2062.codfw.wmnet on all recursors
  • 00:24 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2062.codfw.wmnet on all recursors
  • 00:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2062 to cirrussearch2062
  • 00:23 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2062
  • 00:23 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2062
  • 00:23 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:23 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2062 to cirrussearch2062 - bking@cumin2002"
  • 00:23 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2062 to cirrussearch2062 - bking@cumin2002"
  • 00:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P74771 and previous config saved to /var/cache/conftool/dbconfig/20250409-001905-fceratto.json
  • 00:17 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 00:17 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2062 to cirrussearch2062
  • 00:14 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
  • 00:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P74770 and previous config saved to /var/cache/conftool/dbconfig/20250409-000358-fceratto.json

2025-04-09

  • 12:03 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2052.codfw.wmnet
  • 12:03 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1052.eqiad.wmnet

2025-04-08

  • 23:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
  • 23:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2069.codfw.wmnet with OS bullseye
  • 23:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T391056)', diff saved to https://phabricator.wikimedia.org/P74769 and previous config saved to /var/cache/conftool/dbconfig/20250408-234850-fceratto.json
  • 23:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2069.codfw.wmnet with reason: host reimage
  • 23:36 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T391056)', diff saved to https://phabricator.wikimedia.org/P74768 and previous config saved to /var/cache/conftool/dbconfig/20250408-233611-fceratto.json
  • 23:36 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 23:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T391056)', diff saved to https://phabricator.wikimedia.org/P74767 and previous config saved to /var/cache/conftool/dbconfig/20250408-233549-fceratto.json
  • 23:35 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2069.codfw.wmnet with reason: host reimage
  • 23:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2087.codfw.wmnet with OS bullseye
  • 23:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P74766 and previous config saved to /var/cache/conftool/dbconfig/20250408-232042-fceratto.json
  • 23:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P74765 and previous config saved to /var/cache/conftool/dbconfig/20250408-230535-fceratto.json
  • 23:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2069
  • 23:02 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2069
  • 23:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2087.codfw.wmnet with reason: host reimage
  • 23:02 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2069
  • 23:02 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2069.codfw.wmnet 142.0.192.10.in-addr.arpa 2.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 23:02 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2069.codfw.wmnet 142.0.192.10.in-addr.arpa 2.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 23:02 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:02 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2069 - bking@cumin2002"
  • 23:02 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2069 - bking@cumin2002"
  • 22:56 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2087.codfw.wmnet with reason: host reimage
  • 22:56 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 22:56 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2069
  • 22:56 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2069.codfw.wmnet with OS bullseye
  • 22:55 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2069.codfw.wmnet on all recursors
  • 22:55 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2069.codfw.wmnet on all recursors
  • 22:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2069 to cirrussearch2069
  • 22:54 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2069
  • 22:54 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2069
  • 22:54 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:54 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2069 to cirrussearch2069 - bking@cumin2002"
  • 22:53 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2069 to cirrussearch2069 - bking@cumin2002"
  • 22:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T391056)', diff saved to https://phabricator.wikimedia.org/P74764 and previous config saved to /var/cache/conftool/dbconfig/20250408-225028-fceratto.json
  • 22:49 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 22:49 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2069 to cirrussearch2069
  • 22:48 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
  • 22:40 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 22:40 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 22:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 22:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 22:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 22:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 22:39 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2087
  • 22:39 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2087
  • 22:39 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2087
  • 22:39 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2087.codfw.wmnet 90.0.192.10.in-addr.arpa 0.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 22:39 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2087.codfw.wmnet 90.0.192.10.in-addr.arpa 0.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 22:39 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:39 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2087 - bking@cumin2002"
  • 22:38 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2087 - bking@cumin2002"
  • 22:37 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T391056)', diff saved to https://phabricator.wikimedia.org/P74763 and previous config saved to /var/cache/conftool/dbconfig/20250408-223744-fceratto.json
  • 22:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 22:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T391056)', diff saved to https://phabricator.wikimedia.org/P74762 and previous config saved to /var/cache/conftool/dbconfig/20250408-223721-fceratto.json
  • 22:34 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2047 to codfw - jhancock@cumin2002"
  • 22:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2047 to codfw - jhancock@cumin2002"
  • 22:30 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2087
  • 22:30 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2087.codfw.wmnet with OS bullseye
  • 22:29 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 22:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2087.codfw.wmnet on all recursors
  • 22:28 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2087.codfw.wmnet on all recursors
  • 22:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P74761 and previous config saved to /var/cache/conftool/dbconfig/20250408-222213-fceratto.json
  • 22:12 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
  • 22:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P74760 and previous config saved to /var/cache/conftool/dbconfig/20250408-220706-fceratto.json
  • 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2087 to cirrussearch2087
  • 22:04 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2087
  • 22:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2087
  • 22:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2087 to cirrussearch2087 - bking@cumin2002"
  • 22:03 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2087 to cirrussearch2087 - bking@cumin2002"
  • 22:02 ryankemper: T388610 Elasticsearch->Opensearch row a data node migration ongoing
  • 21:58 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:58 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2087 to cirrussearch2087
  • 21:57 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (3 nodes at a time) for ElasticSearch cluster search_codfw: reimage row A - bking@cumin2002 - T388610
  • 21:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T391056)', diff saved to https://phabricator.wikimedia.org/P74759 and previous config saved to /var/cache/conftool/dbconfig/20250408-215159-fceratto.json
  • 21:40 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T391056)', diff saved to https://phabricator.wikimedia.org/P74758 and previous config saved to /var/cache/conftool/dbconfig/20250408-214049-fceratto.json
  • 21:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 21:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for LoginSignupSpecialPage: Get a login token before persisting the session (T390514), LoginSignupSpecialPage: Get a login token before persisting the session (T390514), [BETA CLUSTER] Decommission Beta Wikifunctions (T362200 T363397 T368161 T373464 T389274) (duration: 15m 42s)
  • 21:32 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 21:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 21:31 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 21:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1257 (T391056)', diff saved to https://phabricator.wikimedia.org/P74757 and previous config saved to /var/cache/conftool/dbconfig/20250408-213136-fceratto.json
  • 21:30 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 21:30 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 21:29 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 21:27 ladsgroup@deploy1003: ladsgroup, jforrester: Continuing with sync
  • 21:25 ladsgroup@deploy1003: ladsgroup, jforrester: Backport for LoginSignupSpecialPage: Get a login token before persisting the session (T390514), LoginSignupSpecialPage: Get a login token before persisting the session (T390514), [BETA CLUSTER] Decommission Beta Wikifunctions (T362200 T363397 T368161 T373464 T389274) synced to the testservers (https://wikitech.wikimed
  • 21:19 brett: import libvmod-netmapper 1.9-4 to component/varnish6 bullseye-wikimedia (T391334)
  • 21:18 ladsgroup@deploy1003: Started scap sync-world: Backport for LoginSignupSpecialPage: Get a login token before persisting the session (T390514), LoginSignupSpecialPage: Get a login token before persisting the session (T390514), [BETA CLUSTER] Decommission Beta Wikifunctions (T362200 T363397 T368161 T373464 T389274)
  • 21:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1257', diff saved to https://phabricator.wikimedia.org/P74756 and previous config saved to /var/cache/conftool/dbconfig/20250408-211629-fceratto.json
  • 21:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1257', diff saved to https://phabricator.wikimedia.org/P74755 and previous config saved to /var/cache/conftool/dbconfig/20250408-210121-fceratto.json
  • 20:51 brett: import libvmod-querysort 0.4-2 to component/varnish6 bullseye-wikimedia (T391334)
  • 20:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1257 (T391056)', diff saved to https://phabricator.wikimedia.org/P74754 and previous config saved to /var/cache/conftool/dbconfig/20250408-204615-fceratto.json
  • 20:37 brett: import varnish-modules 0.15.0-3 to component/varnish6 bullseye-wikimedia (T391334)
  • 20:36 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1257 (T391056)', diff saved to https://phabricator.wikimedia.org/P74753 and previous config saved to /var/cache/conftool/dbconfig/20250408-203618-fceratto.json
  • 20:36 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1257.eqiad.wmnet with reason: Maintenance
  • 20:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1256.eqiad.wmnet with reason: Maintenance
  • 20:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for ArticleFooterEntrypointCard: Fix display of entrypoint (T389176), ArticleFooterEntrypointCard: Fix display of entrypoint (T389176) (duration: 14m 16s)
  • 20:25 brett: import varnishkafka 1.1.0-4 to component/varnish6 bullseyw-wikimedia (T391334)
  • 20:22 brett: import libvmod-re2 1.5.3-4 to component/varnish6 bullseyw-wikimedia (T391334)
  • 20:19 ladsgroup@deploy1003: abi, ladsgroup: Continuing with sync
  • 20:19 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1255.eqiad.wmnet with reason: Maintenance
  • 20:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T391056)', diff saved to https://phabricator.wikimedia.org/P74752 and previous config saved to /var/cache/conftool/dbconfig/20250408-201845-fceratto.json
  • 20:17 ladsgroup@deploy1003: abi, ladsgroup: Backport for ArticleFooterEntrypointCard: Fix display of entrypoint (T389176), ArticleFooterEntrypointCard: Fix display of entrypoint (T389176) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:12 ladsgroup@deploy1003: Started scap sync-world: Backport for ArticleFooterEntrypointCard: Fix display of entrypoint (T389176), ArticleFooterEntrypointCard: Fix display of entrypoint (T389176)
  • 20:12 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host releases2003.codfw.wmnet with OS bookworm
  • 20:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P74751 and previous config saved to /var/cache/conftool/dbconfig/20250408-200338-fceratto.json
  • 19:56 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage
  • 19:53 aokoth@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage
  • 19:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P74750 and previous config saved to /var/cache/conftool/dbconfig/20250408-194831-fceratto.json
  • 19:33 aokoth@cumin1002: START - Cookbook sre.hosts.reimage for host releases2003.codfw.wmnet with OS bookworm
  • 19:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T391056)', diff saved to https://phabricator.wikimedia.org/P74749 and previous config saved to /var/cache/conftool/dbconfig/20250408-193324-fceratto.json
  • 19:21 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T391056)', diff saved to https://phabricator.wikimedia.org/P74748 and previous config saved to /var/cache/conftool/dbconfig/20250408-192147-fceratto.json
  • 19:21 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 19:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 19:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T391056)', diff saved to https://phabricator.wikimedia.org/P74747 and previous config saved to /var/cache/conftool/dbconfig/20250408-191215-fceratto.json
  • 18:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P74746 and previous config saved to /var/cache/conftool/dbconfig/20250408-185708-fceratto.json
  • 18:46 dancy@deploy1003: Installation of scap version "4.152.0" completed for 2 hosts
  • 18:44 dancy@deploy1003: Installing scap version "4.152.0" for 2 host(s)
  • 18:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P74745 and previous config saved to /var/cache/conftool/dbconfig/20250408-184201-fceratto.json
  • 18:34 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.24 refs T386219
  • 18:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T391056)', diff saved to https://phabricator.wikimedia.org/P74743 and previous config saved to /var/cache/conftool/dbconfig/20250408-182654-fceratto.json
  • 18:15 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T391056)', diff saved to https://phabricator.wikimedia.org/P74742 and previous config saved to /var/cache/conftool/dbconfig/20250408-181513-fceratto.json
  • 18:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 18:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T391056)', diff saved to https://phabricator.wikimedia.org/P74741 and previous config saved to /var/cache/conftool/dbconfig/20250408-181450-fceratto.json
  • 18:08 brennen: 1.44.0-wmf.24 train status: no current blockers, moving to group0
  • 18:03 brett: import varnish 6.0.13-1wm1 to component/varnish6 bullseyw-wikimedia (T391334)
  • 17:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P74740 and previous config saved to /var/cache/conftool/dbconfig/20250408-175944-fceratto.json
  • 17:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P74739 and previous config saved to /var/cache/conftool/dbconfig/20250408-174436-fceratto.json
  • 17:38 swfrench@deploy1003: Finished scap sync-world: Pilot scap run using PHP 8.1 container image for maintenance scripts - T390225 (duration: 03m 19s)
  • 17:35 swfrench@deploy1003: Started scap sync-world: Pilot scap run using PHP 8.1 container image for maintenance scripts - T390225
  • 17:32 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 17:30 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply
  • 17:30 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 17:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T391056)', diff saved to https://phabricator.wikimedia.org/P74738 and previous config saved to /var/cache/conftool/dbconfig/20250408-172929-fceratto.json
  • 17:29 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 17:22 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host nokiatest2001.codfw.wmnet
  • 17:17 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T391056)', diff saved to https://phabricator.wikimedia.org/P74737 and previous config saved to /var/cache/conftool/dbconfig/20250408-171753-fceratto.json
  • 17:17 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 17:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T391056)', diff saved to https://phabricator.wikimedia.org/P74736 and previous config saved to /var/cache/conftool/dbconfig/20250408-171731-fceratto.json
  • 17:15 swfrench@deploy1003: Stopping before sync operations
  • 17:14 swfrench@deploy1003: Started scap sync-world: Pilot stop-before-sync scap run using PHP 8.1 container image for maintenance scripts - T390225
  • 17:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P74735 and previous config saved to /var/cache/conftool/dbconfig/20250408-170224-fceratto.json
  • 16:51 ladsgroup@deploy1003: Finished scap sync-world: Backport for Revert "Temporarily enable mobile sitenotice for fawiki" (duration: 20m 49s)
  • 16:50 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:50 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 16:50 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 16:50 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 16:50 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:50 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P74734 and previous config saved to /var/cache/conftool/dbconfig/20250408-164717-fceratto.json
  • 16:45 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 16:45 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 16:44 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply
  • 16:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 16:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 16:41 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:41 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 16:41 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:40 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 16:40 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 16:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 16:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:38 ladsgroup@deploy1003: ladsgroup: Backport for Revert "Temporarily enable mobile sitenotice for fawiki" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T391056)', diff saved to https://phabricator.wikimedia.org/P74733 and previous config saved to /var/cache/conftool/dbconfig/20250408-163210-fceratto.json
  • 16:31 ladsgroup@deploy1003: Started scap sync-world: Backport for Revert "Temporarily enable mobile sitenotice for fawiki"
  • 16:24 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 16:24 hnowlan: running 'ipvsadm --delete-service --tcp-service 10.2.2.26:443 && ipvsadm --delete-service --tcp-service 10.2.2.5:443' on eqiad lvs to remove videoscaler and jobrunner services
  • 16:24 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 16:24 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:24 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 16:24 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:23 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:22 hnowlan: running 'ipvsadm --delete-service --tcp-service 10.2.2.26:443 && ipvsadm --delete-service --tcp-service 10.2.2.5:443' on codfw lvs to remove videoscaler and jobrunner services
  • 16:21 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 16:21 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 16:21 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:21 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 16:21 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:20 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:20 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T391056)', diff saved to https://phabricator.wikimedia.org/P74732 and previous config saved to /var/cache/conftool/dbconfig/20250408-162029-fceratto.json
  • 16:20 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 16:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T391056)', diff saved to https://phabricator.wikimedia.org/P74731 and previous config saved to /var/cache/conftool/dbconfig/20250408-162007-fceratto.json
  • 16:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P74730 and previous config saved to /var/cache/conftool/dbconfig/20250408-160501-fceratto.json
  • 15:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P74729 and previous config saved to /var/cache/conftool/dbconfig/20250408-154954-fceratto.json
  • 15:48 cmooney@cumin1002: START - Cookbook sre.hosts.dhcp for host nokiatest2001.codfw.wmnet
  • 15:45 herron@cumin1002: dbctl commit (dc=all): 'depooling db1246', diff saved to https://phabricator.wikimedia.org/P74728 and previous config saved to /var/cache/conftool/dbconfig/20250408-154509-herron.json
  • 15:39 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 15:37 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 15:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T391056)', diff saved to https://phabricator.wikimedia.org/P74727 and previous config saved to /var/cache/conftool/dbconfig/20250408-153446-fceratto.json
  • 15:22 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T391056)', diff saved to https://phabricator.wikimedia.org/P74726 and previous config saved to /var/cache/conftool/dbconfig/20250408-152212-fceratto.json
  • 15:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 15:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T391056)', diff saved to https://phabricator.wikimedia.org/P74725 and previous config saved to /var/cache/conftool/dbconfig/20250408-152150-fceratto.json
  • 15:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P74724 and previous config saved to /var/cache/conftool/dbconfig/20250408-150643-fceratto.json
  • 15:03 brennen@deploy1003: Finished deploy [phabricator/deployment@99aa712]: deploy phab1004 for T391357 (duration: 00m 38s)
  • 15:03 brennen@deploy1003: Started deploy [phabricator/deployment@99aa712]: deploy phab1004 for T391357
  • 15:02 brennen@deploy1003: Finished deploy [phabricator/deployment@99aa712]: test deploy phab2002 for T391357 (duration: 00m 42s)
  • 15:02 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2002.codfw.wmnet with reason: T391357
  • 15:02 brennen@deploy1003: Started deploy [phabricator/deployment@99aa712]: test deploy phab2002 for T391357
  • 15:01 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1004.eqiad.wmnet with reason: T391357
  • 14:54 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host releases2003.codfw.wmnet with OS bookworm
  • 14:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P74723 and previous config saved to /var/cache/conftool/dbconfig/20250408-145136-fceratto.json
  • 14:36 hnowlan: restarting pybal on A:lvs-low-traffic-codfw to remove jobrunner and videoscaler
  • 14:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T391056)', diff saved to https://phabricator.wikimedia.org/P74722 and previous config saved to /var/cache/conftool/dbconfig/20250408-143628-fceratto.json
  • 14:31 hnowlan: restarting pybal on A:lvs-secondary-codfw
  • 14:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T391056)', diff saved to https://phabricator.wikimedia.org/P74721 and previous config saved to /var/cache/conftool/dbconfig/20250408-142347-fceratto.json
  • 14:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 14:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T391056)', diff saved to https://phabricator.wikimedia.org/P74720 and previous config saved to /var/cache/conftool/dbconfig/20250408-142335-fceratto.json
  • 14:22 hnowlan: restarting pybal on lvs1019 (low-traffic primary) to pick up removal of jobrunner and videoscaler
  • 14:19 fnegri@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddumps1001.wikimedia.org with reason: down for maintenance
  • 14:12 hnowlan: restarting pybal on A:lvs-secondary-eqiad to pick up removal of jobrunner and videoscaler
  • 14:11 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:10 hnowlan: setting jobrunner and videoscaler to service_setup in puppet
  • 14:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P74718 and previous config saved to /var/cache/conftool/dbconfig/20250408-140828-fceratto.json
  • 14:08 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 14:07 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
  • 14:07 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 14:06 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
  • 14:04 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for ArticleFooterEntrypointCard: Change the way codex is loaded (T389176), ArticleFooterEntrypointCard: Change the way codex is loaded (T389176) (duration: 22m 23s)
  • 14:02 aokoth@cumin1002: START - Cookbook sre.hosts.reimage for host releases2003.codfw.wmnet with OS bookworm
  • 13:59 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 13:59 elukey@deploy1003: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
  • 13:57 lucaswerkmeister-wmde@deploy1003: abi, lucaswerkmeister-wmde: Continuing with sync
  • 13:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P74717 and previous config saved to /var/cache/conftool/dbconfig/20250408-135321-fceratto.json
  • 13:49 lucaswerkmeister-wmde@deploy1003: abi, lucaswerkmeister-wmde: Backport for ArticleFooterEntrypointCard: Change the way codex is loaded (T389176), ArticleFooterEntrypointCard: Change the way codex is loaded (T389176) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:45 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on releases2003.codfw.wmnet with reason: Bookworm Re-image
  • 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for ArticleFooterEntrypointCard: Change the way codex is loaded (T389176), ArticleFooterEntrypointCard: Change the way codex is loaded (T389176)
  • 13:38 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Increase entityAccessLimit from 400 to 500 for all wikis except commons. (T384455), Remove unused config vars (T389429), Fix EntitySchema propertyType on Test Wikidata (T371196) (duration: 15m 30s)
  • 13:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T391056)', diff saved to https://phabricator.wikimedia.org/P74716 and previous config saved to /var/cache/conftool/dbconfig/20250408-133814-fceratto.json
  • 13:31 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, ebernhardson, seanleong-wmde: Continuing with sync
  • 13:30 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, ebernhardson, seanleong-wmde: Backport for Increase entityAccessLimit from 400 to 500 for all wikis except commons. (T384455), Remove unused config vars (T389429), Fix EntitySchema propertyType on Test Wikidata (T371196) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:26 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T391056)', diff saved to https://phabricator.wikimedia.org/P74715 and previous config saved to /var/cache/conftool/dbconfig/20250408-132626-fceratto.json
  • 13:26 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 13:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T391056)', diff saved to https://phabricator.wikimedia.org/P74714 and previous config saved to /var/cache/conftool/dbconfig/20250408-132603-fceratto.json
  • 13:22 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Increase entityAccessLimit from 400 to 500 for all wikis except commons. (T384455), Remove unused config vars (T389429), Fix EntitySchema propertyType on Test Wikidata (T371196)
  • 13:18 Lucas_WMDE: lucaswerkmeister-wmde@deploy1003 ~ $ mwscript-k8s --comment=T391299 --follow -- namespaceDupes ptwiktionary --fix
  • 13:17 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [ptwiktionary] Create a Wikisaurus namespace (T391299) (duration: 15m 24s)
  • 13:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P74712 and previous config saved to /var/cache/conftool/dbconfig/20250408-131056-fceratto.json
  • 13:10 lucaswerkmeister-wmde@deploy1003: superpes, lucaswerkmeister-wmde: Continuing with sync
  • 13:09 lucaswerkmeister-wmde@deploy1003: superpes, lucaswerkmeister-wmde: Backport for [ptwiktionary] Create a Wikisaurus namespace (T391299) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:08 marostegui: TEST maintenance s1 eqiad dbmaint T391346
  • 13:02 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [ptwiktionary] Create a Wikisaurus namespace (T391299)
  • 12:57 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2048.codfw.wmnet
  • 12:56 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1048.eqiad.wmnet
  • 12:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P74711 and previous config saved to /var/cache/conftool/dbconfig/20250408-125549-fceratto.json
  • 12:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2048.codfw.wmnet
  • 12:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1048.eqiad.wmnet
  • 12:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T391056)', diff saved to https://phabricator.wikimedia.org/P74709 and previous config saved to /var/cache/conftool/dbconfig/20250408-124042-fceratto.json
  • 12:35 elukey: started the rollout of xz-utils' security upgrades (gradual during the next days)
  • 12:29 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T391056)', diff saved to https://phabricator.wikimedia.org/P74708 and previous config saved to /var/cache/conftool/dbconfig/20250408-122919-fceratto.json
  • 12:29 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 12:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T391056)', diff saved to https://phabricator.wikimedia.org/P74707 and previous config saved to /var/cache/conftool/dbconfig/20250408-122859-fceratto.json
  • 12:14 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P74706 and previous config saved to /var/cache/conftool/dbconfig/20250408-121352-fceratto.json
  • 12:13 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:13 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:12 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P74705 and previous config saved to /var/cache/conftool/dbconfig/20250408-115845-fceratto.json
  • 11:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T391056)', diff saved to https://phabricator.wikimedia.org/P74704 and previous config saved to /var/cache/conftool/dbconfig/20250408-114338-fceratto.json
  • 11:39 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:39 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:31 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T391056)', diff saved to https://phabricator.wikimedia.org/P74703 and previous config saved to /var/cache/conftool/dbconfig/20250408-113154-fceratto.json
  • 11:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:30 cgoubert@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) depool for host wikikube-worker2142.codfw.wmnet
  • 11:27 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2142.codfw.wmnet
  • 11:21 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T391056)', diff saved to https://phabricator.wikimedia.org/P74702 and previous config saved to /var/cache/conftool/dbconfig/20250408-112124-fceratto.json
  • 11:13 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 75% (T360589) (duration: 16m 35s)
  • 11:06 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 11:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P74701 and previous config saved to /var/cache/conftool/dbconfig/20250408-110618-fceratto.json
  • 11:04 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 75% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:02 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 11:02 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 11:01 mvernon@cumin2002: conftool action : set/pooled=yes; selector: name=thanos-fe2007.codfw.wmnet
  • 11:01 mvernon@cumin2002: conftool action : set/pooled=yes; selector: name=thanos-fe2006.codfw.wmnet
  • 11:01 mvernon@cumin2002: conftool action : set/weight=100; selector: name=thanos-fe2007.codfw.wmnet
  • 11:01 mvernon@cumin2002: conftool action : set/weight=100; selector: name=thanos-fe2006.codfw.wmnet
  • 11:01 mvernon@cumin2002: conftool action : set/pooled=yes; selector: name=thanos-fe2005.codfw.wmnet
  • 11:00 mvernon@cumin2002: conftool action : set/weight=100; selector: name=thanos-fe2005.codfw.wmnet
  • 11:00 Emperor: pool thanos-fe200[5-7] T389634
  • 11:00 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting P{lvs3008.esams.wmnet} and A:liberica
  • 10:59 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting P{lvs3008.esams.wmnet} and A:liberica
  • 10:57 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 75% (T360589)
  • 10:56 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 10:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P74700 and previous config saved to /var/cache/conftool/dbconfig/20250408-105111-fceratto.json
  • 10:51 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 10:48 hnowlan@dns1004: END - running authdns-update
  • 10:47 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2007.codfw.wmnet
  • 10:45 hnowlan@dns1004: START - running authdns-update
  • 10:41 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-fe2007.codfw.wmnet
  • 10:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2006.codfw.wmnet
  • 10:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T391056)', diff saved to https://phabricator.wikimedia.org/P74699 and previous config saved to /var/cache/conftool/dbconfig/20250408-103604-fceratto.json
  • 10:34 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 10:33 jelto: restart mailman3.service on lists1004 - T391330
  • 10:33 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 10:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-fe2006.codfw.wmnet
  • 10:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2005.codfw.wmnet
  • 10:25 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-fe2005.codfw.wmnet
  • 10:24 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T391056)', diff saved to https://phabricator.wikimedia.org/P74698 and previous config saved to /var/cache/conftool/dbconfig/20250408-102412-fceratto.json
  • 10:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 09:57 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:42 ozge@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:08 akosiaris@dns1004: END - running authdns-update
  • 09:05 akosiaris@dns1004: START - running authdns-update
  • 08:29 kartik@deploy1003: Finished scap sync-world: Backport for EventStreamConfig: Add RRLA prediction_change stream (T326179) (duration: 23m 21s)
  • 08:29 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=apus,name=apus-fe2003.codfw.wmnet
  • 08:29 mvernon@cumin2002: conftool action : set/weight=40; selector: service=apus,name=apus-fe2003.codfw.wmnet
  • 08:28 Emperor: pool apus-fe2003 T390578
  • 08:22 kartik@deploy1003: kartik, kevinbazira: Continuing with sync
  • 08:12 kartik@deploy1003: kartik, kevinbazira: Backport for EventStreamConfig: Add RRLA prediction_change stream (T326179) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repool ms1 T391317', diff saved to https://phabricator.wikimedia.org/P74695 and previous config saved to /var/cache/conftool/dbconfig/20250408-081248-marostegui.json
  • 08:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depool ms1 T391317', diff saved to https://phabricator.wikimedia.org/P74694 and previous config saved to /var/cache/conftool/dbconfig/20250408-081224-marostegui.json
  • 08:05 kartik@deploy1003: Started scap sync-world: Backport for EventStreamConfig: Add RRLA prediction_change stream (T326179)
  • 08:02 kartik@deploy1003: Finished scap sync-world: Backport for AX: Enable entry-points on Tswana and Venetian wiki (T390023) (duration: 21m 33s)
  • 07:55 kartik@deploy1003: abi, kartik: Continuing with sync
  • 07:48 kartik@deploy1003: abi, kartik: Backport for AX: Enable entry-points on Tswana and Venetian wiki (T390023) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:41 kartik@deploy1003: Started scap sync-world: Backport for AX: Enable entry-points on Tswana and Venetian wiki (T390023)
  • 07:35 slyngshede@dns1004: END - running authdns-update
  • 07:34 kartik@deploy1003: Finished scap sync-world: Backport for AX: Enable Quick Surveys extension on Tswana and Venetian wiki (T390023) (duration: 20m 27s)
  • 07:33 slyngshede@dns1004: START - running authdns-update
  • 07:25 kartik@deploy1003: abi, kartik: Continuing with sync
  • 07:21 kartik@deploy1003: abi, kartik: Backport for AX: Enable Quick Surveys extension on Tswana and Venetian wiki (T390023) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:13 kartik@deploy1003: Started scap sync-world: Backport for AX: Enable Quick Surveys extension on Tswana and Venetian wiki (T390023)
  • 06:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repool ms2 T391317', diff saved to https://phabricator.wikimedia.org/P74693 and previous config saved to /var/cache/conftool/dbconfig/20250408-064813-marostegui.json
  • 06:45 marostegui: Upgrade ms2 to MariaDB 10.11 codfw eqiad dbmaint T391317
  • 06:43 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Maintenance
  • 06:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depool ms2 T391317', diff saved to https://phabricator.wikimedia.org/P74692 and previous config saved to /var/cache/conftool/dbconfig/20250408-064250-marostegui.json
  • 04:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T391056)', diff saved to https://phabricator.wikimedia.org/P74691 and previous config saved to /var/cache/conftool/dbconfig/20250408-045801-fceratto.json
  • 04:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P74690 and previous config saved to /var/cache/conftool/dbconfig/20250408-044254-fceratto.json
  • 04:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P74689 and previous config saved to /var/cache/conftool/dbconfig/20250408-042748-fceratto.json
  • 04:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T391056)', diff saved to https://phabricator.wikimedia.org/P74688 and previous config saved to /var/cache/conftool/dbconfig/20250408-041241-fceratto.json
  • 04:09 mwpresync@deploy1003: Pruned MediaWiki: 1.44.0-wmf.21 (duration: 09m 26s)
  • 04:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2224 (T391056)', diff saved to https://phabricator.wikimedia.org/P74687 and previous config saved to /var/cache/conftool/dbconfig/20250408-040728-fceratto.json
  • 04:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2224.codfw.wmnet with reason: Maintenance
  • 04:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T391056)', diff saved to https://phabricator.wikimedia.org/P74686 and previous config saved to /var/cache/conftool/dbconfig/20250408-040706-fceratto.json
  • 04:06 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.44.0-wmf.24 refs T386219 (duration: 63m 43s)
  • 03:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P74685 and previous config saved to /var/cache/conftool/dbconfig/20250408-035159-fceratto.json
  • 03:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P74684 and previous config saved to /var/cache/conftool/dbconfig/20250408-033652-fceratto.json
  • 03:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T391056)', diff saved to https://phabricator.wikimedia.org/P74683 and previous config saved to /var/cache/conftool/dbconfig/20250408-032145-fceratto.json
  • 03:16 cstone: payments-wiki upgraded from 10b6cf1d to ef9284aa
  • 03:16 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T391056)', diff saved to https://phabricator.wikimedia.org/P74682 and previous config saved to /var/cache/conftool/dbconfig/20250408-031632-fceratto.json
  • 03:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 03:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T391056)', diff saved to https://phabricator.wikimedia.org/P74681 and previous config saved to /var/cache/conftool/dbconfig/20250408-031609-fceratto.json
  • 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.44.0-wmf.24 refs T386219
  • 03:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P74680 and previous config saved to /var/cache/conftool/dbconfig/20250408-030102-fceratto.json
  • 02:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P74679 and previous config saved to /var/cache/conftool/dbconfig/20250408-024555-fceratto.json
  • 02:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T391056)', diff saved to https://phabricator.wikimedia.org/P74678 and previous config saved to /var/cache/conftool/dbconfig/20250408-023047-fceratto.json
  • 02:25 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T391056)', diff saved to https://phabricator.wikimedia.org/P74677 and previous config saved to /var/cache/conftool/dbconfig/20250408-022538-fceratto.json
  • 02:25 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 02:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 02:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T391056)', diff saved to https://phabricator.wikimedia.org/P74676 and previous config saved to /var/cache/conftool/dbconfig/20250408-022146-fceratto.json
  • 02:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P74675 and previous config saved to /var/cache/conftool/dbconfig/20250408-020639-fceratto.json
  • 01:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P74674 and previous config saved to /var/cache/conftool/dbconfig/20250408-015132-fceratto.json
  • 01:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T391056)', diff saved to https://phabricator.wikimedia.org/P74673 and previous config saved to /var/cache/conftool/dbconfig/20250408-013625-fceratto.json
  • 01:34 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T391056)', diff saved to https://phabricator.wikimedia.org/P74672 and previous config saved to /var/cache/conftool/dbconfig/20250408-013412-fceratto.json
  • 01:34 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 01:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T391056)', diff saved to https://phabricator.wikimedia.org/P74671 and previous config saved to /var/cache/conftool/dbconfig/20250408-013348-fceratto.json
  • 01:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P74670 and previous config saved to /var/cache/conftool/dbconfig/20250408-011841-fceratto.json
  • 01:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P74669 and previous config saved to /var/cache/conftool/dbconfig/20250408-010334-fceratto.json
  • 00:48 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1202.eqiad.wmnet
  • 00:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T391056)', diff saved to https://phabricator.wikimedia.org/P74668 and previous config saved to /var/cache/conftool/dbconfig/20250408-004827-fceratto.json
  • 00:47 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T391056)', diff saved to https://phabricator.wikimedia.org/P74667 and previous config saved to /var/cache/conftool/dbconfig/20250408-004715-fceratto.json
  • 00:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 00:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T391056)', diff saved to https://phabricator.wikimedia.org/P74666 and previous config saved to /var/cache/conftool/dbconfig/20250408-004652-fceratto.json
  • 00:43 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1202.eqiad.wmnet
  • 00:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P74665 and previous config saved to /var/cache/conftool/dbconfig/20250408-003144-fceratto.json
  • 00:22 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1202.eqiad.wmnet
  • 00:21 btullis@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1202.eqiad.wmnet
  • 00:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P74664 and previous config saved to /var/cache/conftool/dbconfig/20250408-001637-fceratto.json
  • 00:12 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1202.eqiad.wmnet with OS bullseye
  • 00:12 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1002"
  • 00:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T391056)', diff saved to https://phabricator.wikimedia.org/P74663 and previous config saved to /var/cache/conftool/dbconfig/20250408-000130-fceratto.json

2025-04-07

  • 23:55 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T391056)', diff saved to https://phabricator.wikimedia.org/P74662 and previous config saved to /var/cache/conftool/dbconfig/20250407-235541-fceratto.json
  • 23:55 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 23:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T391056)', diff saved to https://phabricator.wikimedia.org/P74661 and previous config saved to /var/cache/conftool/dbconfig/20250407-235518-fceratto.json
  • 23:44 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1002"
  • 23:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P74660 and previous config saved to /var/cache/conftool/dbconfig/20250407-234011-fceratto.json
  • 23:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P74659 and previous config saved to /var/cache/conftool/dbconfig/20250407-232503-fceratto.json
  • 23:21 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1202.eqiad.wmnet with reason: host reimage
  • 23:18 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1202.eqiad.wmnet with reason: host reimage
  • 23:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T391056)', diff saved to https://phabricator.wikimedia.org/P74658 and previous config saved to /var/cache/conftool/dbconfig/20250407-230956-fceratto.json
  • 23:04 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T391056)', diff saved to https://phabricator.wikimedia.org/P74657 and previous config saved to /var/cache/conftool/dbconfig/20250407-230411-fceratto.json
  • 23:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 23:03 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1202.eqiad.wmnet with OS bullseye
  • 23:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 23:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T391056)', diff saved to https://phabricator.wikimedia.org/P74656 and previous config saved to /var/cache/conftool/dbconfig/20250407-230333-fceratto.json
  • 22:57 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1171.eqiad.wmnet
  • 22:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P74655 and previous config saved to /var/cache/conftool/dbconfig/20250407-224827-fceratto.json
  • 22:35 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1171.eqiad.wmnet
  • 22:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P74654 and previous config saved to /var/cache/conftool/dbconfig/20250407-223319-fceratto.json
  • 22:32 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_magru
  • 22:30 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1170.eqiad.wmnet
  • 22:26 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1170.eqiad.wmnet
  • 22:26 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1171.eqiad.wmnet
  • 22:24 btullis@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1171.eqiad.wmnet
  • 22:24 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1170.eqiad.wmnet
  • 22:20 btullis@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1170.eqiad.wmnet
  • 22:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T391056)', diff saved to https://phabricator.wikimedia.org/P74653 and previous config saved to /var/cache/conftool/dbconfig/20250407-221812-fceratto.json
  • 22:16 ejegg: civicrm upgraded from f7beb984 to b20436a2
  • 22:12 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T391056)', diff saved to https://phabricator.wikimedia.org/P74652 and previous config saved to /var/cache/conftool/dbconfig/20250407-221224-fceratto.json
  • 22:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 22:12 ejegg: civicrm upgraded from 73533b73 to f7beb984
  • 22:09 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 22:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T391056)', diff saved to https://phabricator.wikimedia.org/P74651 and previous config saved to /var/cache/conftool/dbconfig/20250407-220851-fceratto.json
  • 21:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P74650 and previous config saved to /var/cache/conftool/dbconfig/20250407-215342-fceratto.json
  • 21:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P74649 and previous config saved to /var/cache/conftool/dbconfig/20250407-213835-fceratto.json
  • 21:26 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp700[3-8].magru.wmnet} and A:cp
  • 21:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T391056)', diff saved to https://phabricator.wikimedia.org/P74647 and previous config saved to /var/cache/conftool/dbconfig/20250407-212328-fceratto.json
  • 21:22 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T391056)', diff saved to https://phabricator.wikimedia.org/P74646 and previous config saved to /var/cache/conftool/dbconfig/20250407-212220-fceratto.json
  • 21:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 21:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 21:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T391056)', diff saved to https://phabricator.wikimedia.org/P74645 and previous config saved to /var/cache/conftool/dbconfig/20250407-211835-fceratto.json
  • 21:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P74644 and previous config saved to /var/cache/conftool/dbconfig/20250407-210328-fceratto.json
  • 20:55 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2056.codfw.wmnet
  • 20:55 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2055.codfw.wmnet
  • 20:49 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 20:49 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
  • 20:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P74643 and previous config saved to /var/cache/conftool/dbconfig/20250407-204821-fceratto.json
  • 20:45 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1096.eqiad.wmnet
  • 20:39 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic1096.eqiad.wmnet
  • 20:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T391056)', diff saved to https://phabricator.wikimedia.org/P74642 and previous config saved to /var/cache/conftool/dbconfig/20250407-203313-fceratto.json
  • 20:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T391056)', diff saved to https://phabricator.wikimedia.org/P74641 and previous config saved to /var/cache/conftool/dbconfig/20250407-203205-fceratto.json
  • 20:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 20:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T391056)', diff saved to https://phabricator.wikimedia.org/P74640 and previous config saved to /var/cache/conftool/dbconfig/20250407-203142-fceratto.json
  • 20:20 James_F: Backport window complete.
  • 20:19 jforrester@deploy1003: Finished scap sync-world: Backport for search-redirect: Handle $_GET potential vulnerability scanning (T389019), wikifunctionswiki: Make 'native' mode the default for Maths (duration: 14m 06s)
  • 20:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P74638 and previous config saved to /var/cache/conftool/dbconfig/20250407-201635-fceratto.json
  • 20:12 jforrester@deploy1003: jforrester: Continuing with sync
  • 20:09 jforrester@deploy1003: jforrester: Backport for search-redirect: Handle $_GET potential vulnerability scanning (T389019), wikifunctionswiki: Make 'native' mode the default for Maths synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:05 jforrester@deploy1003: Started scap sync-world: Backport for search-redirect: Handle $_GET potential vulnerability scanning (T389019), wikifunctionswiki: Make 'native' mode the default for Maths
  • 20:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P74637 and previous config saved to /var/cache/conftool/dbconfig/20250407-200128-fceratto.json
  • 19:48 urandom: extending vg0/srv logical volume, sessionstore100[4-6].eqiad.wmnet — T390514
  • 19:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T391056)', diff saved to https://phabricator.wikimedia.org/P74636 and previous config saved to /var/cache/conftool/dbconfig/20250407-194621-fceratto.json
  • 19:44 urandom: extending vg0/srv logical volume, sesionstore2006 — T390514
  • 19:44 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T391056)', diff saved to https://phabricator.wikimedia.org/P74635 and previous config saved to /var/cache/conftool/dbconfig/20250407-194412-fceratto.json
  • 19:44 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 19:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T391056)', diff saved to https://phabricator.wikimedia.org/P74634 and previous config saved to /var/cache/conftool/dbconfig/20250407-194350-fceratto.json
  • 19:41 urandom: extending vg0/srv logical volume, sesionstore2005 — T390514
  • 19:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P74633 and previous config saved to /var/cache/conftool/dbconfig/20250407-192842-fceratto.json
  • 19:19 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1096* for ban node to stop high rejection rates - bking@cumin2002
  • 19:19 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1096* for ban node to stop high rejection rates - bking@cumin2002
  • 19:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P74632 and previous config saved to /var/cache/conftool/dbconfig/20250407-191335-fceratto.json
  • 19:12 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1202.eqiad.wmnet with OS bullseye
  • 19:06 urandom: extending vg0/srv logical volume, sesionstore2004 — T390514
  • 18:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T391056)', diff saved to https://phabricator.wikimedia.org/P74631 and previous config saved to /var/cache/conftool/dbconfig/20250407-185828-fceratto.json
  • 18:56 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T391056)', diff saved to https://phabricator.wikimedia.org/P74630 and previous config saved to /var/cache/conftool/dbconfig/20250407-185619-fceratto.json
  • 18:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 18:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T391056)', diff saved to https://phabricator.wikimedia.org/P74629 and previous config saved to /var/cache/conftool/dbconfig/20250407-185556-fceratto.json
  • 18:55 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_magru
  • 18:49 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp700[3-8].magru.wmnet} and A:cp
  • 18:48 wfan: payments-wiki upgraded from 646f47bf to 10b6cf1d
  • 18:43 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7002.magru.wmnet
  • 18:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P74628 and previous config saved to /var/cache/conftool/dbconfig/20250407-184049-fceratto.json
  • 18:32 dancy@deploy1003: Finished scap sync-world: testing (duration: 05m 35s)
  • 18:30 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
  • 18:26 dancy@deploy1003: Started scap sync-world: testing
  • 18:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P74627 and previous config saved to /var/cache/conftool/dbconfig/20250407-182542-fceratto.json
  • 18:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T391056)', diff saved to https://phabricator.wikimedia.org/P74625 and previous config saved to /var/cache/conftool/dbconfig/20250407-181035-fceratto.json
  • 18:09 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1202.eqiad.wmnet with OS bullseye
  • 18:09 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T391056)', diff saved to https://phabricator.wikimedia.org/P74624 and previous config saved to /var/cache/conftool/dbconfig/20250407-180927-fceratto.json
  • 18:09 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 18:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T391056)', diff saved to https://phabricator.wikimedia.org/P74623 and previous config saved to /var/cache/conftool/dbconfig/20250407-180905-fceratto.json
  • 18:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1202.eqiad.wmnet with OS bullseye
  • 17:59 brett: Upload varnishkafka 1.2.0-2 to bullseye-wikimedia (T389605)
  • 17:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P74622 and previous config saved to /var/cache/conftool/dbconfig/20250407-175358-fceratto.json
  • 17:50 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Xiaoxiao out of all services on: 2397 hosts
  • 17:44 brett: Remove libvmod-netmapper, libvmod-querysort, varnish-re2, varnish, varnishkafka, varnish-modules from bullseye-wikimedia component/varnish-staging
  • 17:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P74621 and previous config saved to /var/cache/conftool/dbconfig/20250407-173851-fceratto.json
  • 17:27 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7002.magru.wmnet
  • 17:26 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
  • 17:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T391056)', diff saved to https://phabricator.wikimedia.org/P74620 and previous config saved to /var/cache/conftool/dbconfig/20250407-172343-fceratto.json
  • 17:22 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T391056)', diff saved to https://phabricator.wikimedia.org/P74619 and previous config saved to /var/cache/conftool/dbconfig/20250407-172234-fceratto.json
  • 17:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 17:17 brett: Re-enabling Puppet on A:cp (T378737)
  • 17:04 brett: Disabling puppet on A:cp to roll out removal of vanrish 6/7 template switching (T378737)
  • 17:04 dancy@deploy1003: Installation of scap version "4.151.0" completed for 190 hosts
  • 16:59 dancy@deploy1003: Installing scap version "4.151.0" for 190 host(s)
  • 16:54 slyngshede@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Xiaoxiao out of all services on: 2396 hosts
  • 16:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1202.eqiad.wmnet with OS bullseye
  • 16:52 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1202.eqiad.wmnet with OS bullseye
  • 16:33 brett: Upload ncmonitor 1.3.4-1 to bookworm-wikimedia
  • 16:30 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: relforge1003* for test ban syntax - bking@cumin2002 - T391151
  • 16:30 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: relforge1003* for test ban syntax - bking@cumin2002 - T391151
  • 16:29 mforns@deploy1003: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 16:29 mforns@deploy1003: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 16:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1202.eqiad.wmnet with OS bullseye
  • 16:24 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-worker1202.eqiad.wmnet on all recursors
  • 16:24 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache an-worker1202.eqiad.wmnet on all recursors
  • 16:23 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-worker1202.eqiad.wmnet on all recursors
  • 16:23 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache an-worker1202.eqiad.wmnet on all recursors
  • 16:17 mforns@deploy1003: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 16:17 mforns@deploy1003: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 16:15 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1202
  • 16:15 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1202
  • 16:10 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:08 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: relforge1004* for test ban syntax - bking@cumin2002 - T391151
  • 16:08 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: relforge1004* for test ban syntax - bking@cumin2002 - T391151
  • 16:07 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in relforge
  • 16:07 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in relforge
  • 15:59 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:58 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:56 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 15:49 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1202
  • 15:49 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1202
  • 15:44 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1202
  • 15:44 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1202
  • 15:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:40 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:28 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1202
  • 15:28 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1202
  • 15:25 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:23 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1202
  • 15:23 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1202
  • 15:22 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for an-worker1202 - jclark@cumin1002"
  • 15:22 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for an-worker1202 - jclark@cumin1002"
  • 15:21 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2016.codfw.wmnet
  • 15:21 mvernon@cumin1002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe2016.codfw.wmnet
  • 15:21 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe2016.codfw.wmnet
  • 15:21 mvernon@cumin1002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe2016.codfw.wmnet
  • 15:21 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2015.codfw.wmnet
  • 15:21 mvernon@cumin1002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe2015.codfw.wmnet
  • 15:21 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe2015.codfw.wmnet
  • 15:21 mvernon@cumin1002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe2015.codfw.wmnet
  • 15:21 Emperor: pool ms-fe2015 ms-fe2016 T388887
  • 15:20 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 15:18 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 15:16 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 15:11 elukey@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 15:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2016.codfw.wmnet
  • 15:07 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2015.codfw.wmnet
  • 15:04 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:04 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:03 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2016.codfw.wmnet
  • 15:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:02 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1202.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:01 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-fe2015.codfw.wmnet
  • 15:01 elukey@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
  • 15:01 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1202
  • 15:01 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1202
  • 14:59 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1202
  • 14:59 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1202
  • 14:55 urandom: enabling unchecked_tombstone_compaction on sessionstore Cassandra — T390514
  • 14:31 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1169
  • 14:31 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1169
  • 14:29 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cirrussearch[2055-2056].codfw.wmnet with reason: adding net-new role
  • 14:12 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:09 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 14:09 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 14:09 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 14:09 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 14:09 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 14:07 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 14:07 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 14:07 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1047.eqiad.wmnet
  • 14:01 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 14:01 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1047.eqiad.wmnet
  • 14:00 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 14:00 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 13:59 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 13:54 James_F: Backport window complete.
  • 13:52 jforrester@deploy1003: Finished scap sync-world: Backport for Improve GeoCrumbs fallback when page property is not (yet) set (T391128) (duration: 13m 25s)
  • 13:44 jforrester@deploy1003: jforrester, cscott: Continuing with sync
  • 13:44 fabfur: deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1134689 on A:cp-esams (T384227)
  • 13:44 jforrester@deploy1003: jforrester, cscott: Backport for Improve GeoCrumbs fallback when page property is not (yet) set (T391128) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:43 fabfur: disable puppet on A:cp-esams to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1134689 (T384227)
  • 13:38 jforrester@deploy1003: Started scap sync-world: Backport for Improve GeoCrumbs fallback when page property is not (yet) set (T391128)
  • 13:38 jforrester@deploy1003: Sync cancelled.
  • 13:36 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:36 jforrester@deploy1003: jforrester, cscott: Backport for Improve GeoCrumbs fallback when page property is not (yet) set (T391128) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:36 taavi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudsw-b1.private.codfw.wikimedia.cloud on codfw recursors
  • 13:36 taavi@cumin1002: START - Cookbook sre.dns.wipe-cache cloudsw-b1.private.codfw.wikimedia.cloud on codfw recursors
  • 13:36 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.9-1wm1_amd64.changes: T390912
  • 13:35 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:34 sukhe: sudo -i reprepro remove bullseye-wikimedia trafficserver: T390912
  • 13:34 sukhe: sudo -i reprepro remove bullseye-wikimedia trafficserver
  • 13:32 sukhe: depool cp4037: reverting to ATS 9.2.9
  • 13:30 jforrester@deploy1003: Started scap sync-world: Backport for Improve GeoCrumbs fallback when page property is not (yet) set (T391128)
  • 13:26 jforrester@deploy1003: Finished scap sync-world: Backport for Shift to Parsoid Fragment support v3 (T390420), Where Parsoid Read Views are the default, use it for MFE as well (T376048 T374578) (duration: 20m 54s)
  • 13:24 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4037.*} and A:cp for 9.2.10-1wm1
  • 13:22 sukhe: P{cp4037.*} and A:cp for 9.2.10-1wm1 T390912
  • 13:21 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4037.*} and A:cp for 9.2.10-1wm1
  • 13:20 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 13:20 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 13:20 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 13:19 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 13:19 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 13:19 jelto@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 13:18 jforrester@deploy1003: jforrester, cscott: Continuing with sync
  • 13:14 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2047.codfw.wmnet
  • 13:13 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1046.eqiad.wmnet
  • 13:10 jforrester@deploy1003: jforrester, cscott: Backport for Shift to Parsoid Fragment support v3 (T390420), Where Parsoid Read Views are the default, use it for MFE as well (T376048 T374578) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:07 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1046.eqiad.wmnet
  • 13:05 jforrester@deploy1003: Started scap sync-world: Backport for Shift to Parsoid Fragment support v3 (T390420), Where Parsoid Read Views are the default, use it for MFE as well (T376048 T374578)
  • 12:56 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2046.codfw.wmnet
  • 12:55 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1045.eqiad.wmnet
  • 12:49 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2046.codfw.wmnet
  • 12:49 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1045.eqiad.wmnet
  • 12:48 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:47 jelto@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 12:35 topranks: cloudsw1-d5-eqiad: add routes for WMCS OpenStack IPv6 aggregate to cloudgw VIP T389958
  • 12:32 topranks: cloudsw1-c8-eqiad: add routes for WMCS OpenStack IPv6 aggregate to cloudgw VIP T389958
  • 11:57 btullis@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on an-worker1169.eqiad.wmnet with reason: Moving to rack F8
  • 11:38 topranks: enable EBGP between cr2-eqiad and cloudsw1-d5-eqiad (IPv6 / cloud vrf) T389958
  • 11:25 topranks: enable EBGP between cr1-eqiad and cloudsw1-c8-eqiad (IPv6 / cloud vrf) T389958
  • 11:00 ladsgroup@deploy1003: Finished scap sync-world: Backport for Revert "Take 2: Large math formulae should be scrollable" (T201233) (duration: 13m 12s)
  • 11:00 fabfur: deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1134648 on A:cp-eqiad (T384227)
  • 10:58 fabfur: disable puppet on A:cp-eqiad to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1134648 (T384227)
  • 10:53 ladsgroup@deploy1003: jdlrobson, ladsgroup: Continuing with sync
  • 10:53 ladsgroup@deploy1003: jdlrobson, ladsgroup: Backport for Revert "Take 2: Large math formulae should be scrollable" (T201233) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:50 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 10:50 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for Revert "Take 2: Large math formulae should be scrollable" (T201233)
  • 10:46 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 70% (T360589) (duration: 14m 22s)
  • 10:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
  • 10:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
  • 10:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 10:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 10:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 10:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 10:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 10:42 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 10:42 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
  • 10:42 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
  • 10:41 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 10:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 10:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 10:39 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 10:39 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 10:37 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 70% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:32 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 70% (T360589)
  • 10:13 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 10:12 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:57 daniel@deploy1003: Finished scap sync-world: Backport for [pswiki] Change the logo and wordmark/tagline (T360851), [tawiki] Enable translator usergroup and only allows translator to use ContentTranslation (T391171) (duration: 18m 31s)
  • 09:49 daniel@deploy1003: superpes, daniel: Continuing with sync
  • 09:44 daniel@deploy1003: superpes, daniel: Backport for [pswiki] Change the logo and wordmark/tagline (T360851), [tawiki] Enable translator usergroup and only allows translator to use ContentTranslation (T391171) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:39 daniel@deploy1003: Started scap sync-world: Backport for [pswiki] Change the logo and wordmark/tagline (T360851), [tawiki] Enable translator usergroup and only allows translator to use ContentTranslation (T391171)
  • 09:22 fabfur: deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1134630 on A:cp-drmrs (T384227)
  • 09:18 fabfur: disable puppet on A:cp-drmrs to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1134630 (T384227)
  • 09:13 daniel@deploy1003: Finished scap sync-world: Backport for EventIngress: use getDeletedPage instead of getPageStateBefore (T388588 T391051) (duration: 19m 43s)
  • 09:12 slyngshede@dns1004: END - running authdns-update
  • 09:09 slyngshede@dns1004: START - running authdns-update
  • 09:03 daniel@deploy1003: daniel: Continuing with sync
  • 09:00 XioNoX: push pfw policies - T390908
  • 08:58 daniel@deploy1003: daniel: Backport for EventIngress: use getDeletedPage instead of getPageStateBefore (T388588 T391051) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:53 daniel@deploy1003: Started scap sync-world: Backport for EventIngress: use getDeletedPage instead of getPageStateBefore (T388588 T391051)
  • 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host atlas1001.wikimedia.org
  • 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas1001.wikimedia.org - ayounsi@cumin1002"
  • 08:44 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas1001.wikimedia.org - ayounsi@cumin1002"
  • 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) atlas1001.wikimedia.org on all recursors
  • 08:44 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache atlas1001.wikimedia.org on all recursors
  • 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas1001.wikimedia.org - ayounsi@cumin1002"
  • 08:43 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas1001.wikimedia.org - ayounsi@cumin1002"
  • 08:39 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 08:39 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host atlas1001.wikimedia.org
  • 08:12 daniel@deploy1003: Started scap sync-world: Backport for EventIngress: use getDeletedPage instead of getPageStateBefore (T388588 T391051)
  • 08:10 fabfur: deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1133897 on A:cp-codfw (T384227)
  • 08:10 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2160,2234].codfw.wmnet with reason: Maintenance
  • 08:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2230.codfw.wmnet,db1176.eqiad.wmnet with reason: Maintenance
  • 08:06 fabfur: disable puppet on A:cp-codfw to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1133897 (T384227)
  • 07:45 dcausse: T391122: reconciled 14 wikidata items (lost EventBus/eventgate events)
  • 06:40 daniel@deploy1003: Started scap sync-world: Backport for EventIngress: use getDeletedPage instead of getPageStateBefore (T388588 T391051)

2025-04-04

  • 21:18 inflatador: bking@apt1002 publish-wmf-opensearch-search-plugins_1.3.20-4 to component/opensearch13 bullseye-wikimedia 1134285
  • 20:22 urandom: starting `nodetool garbage collect -j 2`, sessionstore Cassandra
  • 19:03 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 19:03 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 18:57 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 18:56 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 18:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2045.codfw.wmnet with OS bookworm
  • 18:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2046.codfw.wmnet with OS bookworm
  • 18:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2045.codfw.wmnet with OS bookworm
  • 17:12 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:09 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:09 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:04 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:03 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:01 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:00 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:57 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:57 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:49 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:49 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:45 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:45 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:35 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:22 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:22 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:49 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:48 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:42 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:41 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:11 tchin@deploy1003: Finished deploy [airflow-dags/analytics@bece0a7]: (no justification provided) (duration: 00m 34s)
  • 15:11 tchin@deploy1003: Started deploy [airflow-dags/analytics@bece0a7]: (no justification provided)
  • 15:05 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:04 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:03 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:03 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:03 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:03 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:00 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:59 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:55 tchin@deploy1003: Finished deploy [analytics/refinery@c4ab9ef] (thin): THIN [analytics/refinery@c4ab9efd] (duration: 00m 59s)
  • 14:54 tchin@deploy1003: Started deploy [analytics/refinery@c4ab9ef] (thin): THIN [analytics/refinery@c4ab9efd]
  • 14:53 tchin@deploy1003: Finished deploy [analytics/refinery@c4ab9ef]: [analytics/refinery@c4ab9efd] (duration: 02m 54s)
  • 14:50 tchin@deploy1003: Started deploy [analytics/refinery@c4ab9ef]: [analytics/refinery@c4ab9efd]
  • 14:49 tchin@deploy1003: Finished deploy [analytics/refinery@c4ab9ef] (hadoop-test): TEST [analytics/refinery@c4ab9efd] (duration: 03m 01s)
  • 14:46 tchin@deploy1003: Started deploy [analytics/refinery@c4ab9ef] (hadoop-test): TEST [analytics/refinery@c4ab9efd]
  • 14:45 tchin: Deploying refinery for T389162
  • 14:43 claime: Extending root vg on mwmaint1002 by 20GB
  • 13:11 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:10 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:05 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:05 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:01 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Add Item and CustomItem classes as properties to `$.ui.ooMenu` (T390949) (duration: 15m 04s)
  • 10:54 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Continuing with sync
  • 10:54 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Backport for Add Item and CustomItem classes as properties to `$.ui.ooMenu` (T390949) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:46 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Add Item and CustomItem classes as properties to `$.ui.ooMenu` (T390949)
  • 10:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be1070.eqiad.wmnet
  • 10:38 mvernon@cumin1002: START - Cookbook sre.hosts.remove-downtime for ms-be1070.eqiad.wmnet
  • 10:02 Emperor: bulk-VACUUM of container dbs ms-be1070 T377827
  • 10:02 mvernon@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be1070.eqiad.wmnet with reason: vacuum overlarge container dbs
  • 09:57 moritzm: installing vim security updates
  • 09:45 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:44 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:39 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:29 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 08:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 08:30 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 06:45 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@d6ad899]: Update artifacts for analytics_test (duration: 00m 15s)
  • 06:45 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@d6ad899]: Update artifacts for analytics_test
  • 06:44 aqu@deploy1003: Finished deploy [airflow-dags/analytics@d6ad899]: Update artifacts for analytics (duration: 00m 35s)
  • 06:44 aqu@deploy1003: Started deploy [airflow-dags/analytics@d6ad899]: Update artifacts for analytics
  • 05:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on db2186.codfw.wmnet with reason: Maintenance in sanitarium
  • 05:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on db1154.eqiad.wmnet with reason: Maintenance in sanitarium
  • 05:02 TimStarling: on mwmaint1002 ran cleanupBlocks.php on all wikis
  • 00:51 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 00:41 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 00:34 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 00:24 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 00:23 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 00:14 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 00:11 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 00:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply

2025-04-03

  • 23:45 tstarling@deploy1003: Finished scap sync-world: Backport for Enable Codex and Multiblocks in German and Italian wiki (T377121) (duration: 15m 25s)
  • 23:38 tstarling@deploy1003: hmonroy, tstarling: Continuing with sync
  • 23:35 tstarling@deploy1003: hmonroy, tstarling: Backport for Enable Codex and Multiblocks in German and Italian wiki (T377121) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:30 tstarling@deploy1003: Started scap sync-world: Backport for Enable Codex and Multiblocks in German and Italian wiki (T377121)
  • 21:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2056.codfw.wmnet with OS bullseye
  • 21:37 James_F: Backport deploy done.
  • 21:36 jforrester@deploy1003: Finished scap sync-world: Backport for Revert "VE: Enable mobile insert menu everywhere except top 20 mobile VE wikipedias" (duration: 15m 28s)
  • 21:29 jforrester@deploy1003: jforrester: Continuing with sync
  • 21:29 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cirrussearch* for ban cirrus nodes to prevent replication problems - bking@cumin2002 - T388610
  • 21:29 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cirrussearch* for ban cirrus nodes to prevent replication problems - bking@cumin2002 - T388610
  • 21:28 jforrester@deploy1003: jforrester: Backport for Revert "VE: Enable mobile insert menu everywhere except top 20 mobile VE wikipedias" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:21 jforrester@deploy1003: Started scap sync-world: Backport for Revert "VE: Enable mobile insert menu everywhere except top 20 mobile VE wikipedias"
  • 21:19 jforrester@deploy1003: Sync cancelled.
  • 21:13 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2056.codfw.wmnet with reason: host reimage
  • 21:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2056.codfw.wmnet with reason: host reimage
  • 21:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2056
  • 21:06 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2056
  • 21:06 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2056.codfw.wmnet with OS bullseye
  • 21:06 jforrester@deploy1003: esanders, jforrester: Backport for Mobile insert menu: Exclude media and signature tools (T385851), VE: Enable mobile insert menu everywhere except top 20 mobile VE wikipedias (T388604) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:00 jforrester@deploy1003: Started scap sync-world: Backport for Mobile insert menu: Exclude media and signature tools (T385851), VE: Enable mobile insert menu everywhere except top 20 mobile VE wikipedias (T388604)
  • 20:27 jforrester@deploy1003: esanders, jforrester: Backport for wikifunctionswiki: Disable 'mathml' mode for Maths, requires RESTbase, Hide "Insert graph" tool in VE when graphs are disabled (T387501), Enable DiscussionTools visual enhancements on zhwiki (T379264), Revert "End EmailAuth enforcement group 2 test" synced to the testservers (https://wi
  • 20:23 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.10-1wm1_amd64.changes: T379797
  • 20:18 jforrester@deploy1003: Started scap sync-world: Backport for wikifunctionswiki: Disable 'mathml' mode for Maths, requires RESTbase, Hide "Insert graph" tool in VE when graphs are disabled (T387501), Enable DiscussionTools visual enhancements on zhwiki (T379264), Revert "End EmailAuth enforcement group 2 test"
  • 20:13 jforrester@deploy1003: sync-world aborted: Backport for End EmailAuth enforcement group 2 test (T390662), wikifunctionswiki: Disable 'mathml' mode for Maths, requires RESTbase (duration: 00m 33s)
  • 20:12 jforrester@deploy1003: Started scap sync-world: Backport for End EmailAuth enforcement group 2 test (T390662), wikifunctionswiki: Disable 'mathml' mode for Maths, requires RESTbase
  • 19:34 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 19:34 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 19:33 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:33 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:32 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 19:32 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 19:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2056
  • 19:20 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2056
  • 19:19 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2056
  • 19:19 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2056.codfw.wmnet 181.0.192.10.in-addr.arpa 1.8.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 19:19 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2056.codfw.wmnet 181.0.192.10.in-addr.arpa 1.8.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 19:19 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:19 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2056 - bking@cumin2002"
  • 19:19 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2056 - bking@cumin2002"
  • 19:17 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 19:17 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 19:15 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 19:14 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 19:14 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 19:14 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 19:13 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 19:13 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 19:13 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 19:13 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 19:13 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2056
  • 19:13 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2056.codfw.wmnet with OS bullseye
  • 19:13 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 19:13 akosiaris@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 19:12 akosiaris@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 19:12 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 19:11 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 19:06 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cirrussearch2055*,cirrussearch2056* for ban cirrus nodes to prevent replication problems - bking@cumin2002 - T388610
  • 19:06 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cirrussearch2055*,cirrussearch2056* for ban cirrus nodes to prevent replication problems - bking@cumin2002 - T388610
  • 19:02 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cirrussearch* for ban cirrus nodes to prevent replication problems - bking@cumin2002 - T388610
  • 19:02 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cirrussearch* for ban cirrus nodes to prevent replication problems - bking@cumin2002 - T388610
  • 18:21 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test one - bking@cumin2002 - T388610
  • 18:20 dancy@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.23 refs T386218
  • 18:12 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test one - bking@cumin2002 - T388610
  • 18:08 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test one - bking@cumin2002 - T388610
  • 18:06 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test one - bking@cumin2002 - T388610
  • 18:04 dancy@deploy1003: Installation of scap version "4.149.0" completed for 2 hosts
  • 18:03 dancy@deploy1003: Installing scap version "4.149.0" for 2 host(s)
  • 17:57 reedy@deploy1003: Finished scap sync-world: Backport for Banner: More reading from primary... (T390956), CommonSettings-labs: Update BounceHandler config (duration: 17m 43s)
  • 17:48 reedy@deploy1003: reedy: Continuing with sync
  • 17:47 reedy@deploy1003: reedy: Backport for Banner: More reading from primary... (T390956), CommonSettings-labs: Update BounceHandler config synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:39 reedy@deploy1003: Started scap sync-world: Backport for Banner: More reading from primary... (T390956), CommonSettings-labs: Update BounceHandler config
  • 17:38 swfrench@deploy1003: Finished scap sync-world: Deployment to pick up new PHP 8.1 production images (duration: 28m 57s)
  • 17:32 dzahn@dns1004: END - running authdns-update
  • 17:30 dzahn@dns1004: START - running authdns-update
  • 17:12 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:10 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:10 swfrench@deploy1003: Started scap sync-world: Deployment to pick up new PHP 8.1 production images
  • 17:02 sukhe@dns1004: END - running authdns-update
  • 17:02 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 17:02 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 17:00 sukhe@dns1004: START - running authdns-update
  • 16:58 reedy@deploy1003: Finished scap sync-world: Backport for Banner: While saving, do exists() against primary (T390956) (duration: 21m 33s)
  • 16:54 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:54 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 16:51 reedy@deploy1003: reedy: Continuing with sync
  • 16:44 reedy@deploy1003: reedy: Backport for Banner: While saving, do exists() against primary (T390956) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:37 reedy@deploy1003: Started scap sync-world: Backport for Banner: While saving, do exists() against primary (T390956)
  • 16:37 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:37 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:36 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:36 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:36 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:36 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:29 reedy@deploy1003: Finished scap sync-world: Backport for Banner: Conditionally check for banner existence from primary db (T390956) (duration: 15m 13s)
  • 16:22 hnowlan: decommissioning all but 1 eqiad jobrunner node in confctl
  • 16:22 reedy@deploy1003: reedy: Continuing with sync
  • 16:21 reedy@deploy1003: reedy: Backport for Banner: Conditionally check for banner existence from primary db (T390956) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:17 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
  • 16:17 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
  • 16:14 reedy@deploy1003: Started scap sync-world: Backport for Banner: Conditionally check for banner existence from primary db (T390956)
  • 16:06 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1166-1168].eqiad.wmnet
  • 16:06 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1166-1168].eqiad.wmnet
  • 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for Enable EmailAuth enforcement on group 2 for short test (#2) (T390662) (duration: 14m 15s)
  • 15:58 hnowlan: running homer 'cr*eqiad*' commit for new wikikube workers
  • 15:55 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1168.eqiad.wmnet with OS bookworm
  • 15:53 ladsgroup@deploy1003: tgr, ladsgroup: Continuing with sync
  • 15:52 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on elastic2056.codfw.wmnet with reason: adding net-new role
  • 15:52 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1167.eqiad.wmnet with OS bookworm
  • 15:52 ladsgroup@deploy1003: tgr, ladsgroup: Backport for Enable EmailAuth enforcement on group 2 for short test (#2) (T390662) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for Enable EmailAuth enforcement on group 2 for short test (#2) (T390662)
  • 15:41 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1166.eqiad.wmnet with OS bookworm
  • 15:40 reedy@deploy1003: Finished scap sync-world: Backport for Remove catching of db exception (T390956) (duration: 17m 28s)
  • 15:38 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1168.eqiad.wmnet with reason: host reimage
  • 15:34 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1167.eqiad.wmnet with reason: host reimage
  • 15:33 reedy@deploy1003: reedy: Continuing with sync
  • 15:32 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1168.eqiad.wmnet with reason: host reimage
  • 15:31 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1167.eqiad.wmnet with reason: host reimage
  • 15:30 reedy@deploy1003: reedy: Backport for Remove catching of db exception (T390956) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:24 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1166.eqiad.wmnet with reason: host reimage
  • 15:22 reedy@deploy1003: Started scap sync-world: Backport for Remove catching of db exception (T390956)
  • 15:21 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1166.eqiad.wmnet with reason: host reimage
  • 15:17 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1168
  • 15:17 hnowlan@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1168
  • 15:17 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1168.eqiad.wmnet with OS bookworm
  • 15:16 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1167
  • 15:16 hnowlan@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1167
  • 15:16 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1167.eqiad.wmnet with OS bookworm
  • 15:16 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1166.eqiad.wmnet wikikube-worker1167.eqiad.wmnet wikikube-worker1168.eqiad.wmnet on all recursors
  • 15:16 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1166.eqiad.wmnet wikikube-worker1167.eqiad.wmnet wikikube-worker1168.eqiad.wmnet on all recursors
  • 15:15 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:14 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1438 to wikikube-worker1168
  • 15:14 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1168
  • 15:14 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:13 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1168
  • 15:13 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:13 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1438 to wikikube-worker1168 - hnowlan@cumin1002"
  • 15:13 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1438 to wikikube-worker1168 - hnowlan@cumin1002"
  • 15:10 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1437 to wikikube-worker1167
  • 15:10 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1167
  • 15:10 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 15:09 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 15:09 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:09 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 15:09 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1167
  • 15:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1437 to wikikube-worker1167 - hnowlan@cumin1002"
  • 15:08 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:08 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1437 to wikikube-worker1167 - hnowlan@cumin1002"
  • 15:06 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1166
  • 15:06 hnowlan@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1166
  • 15:06 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1166.eqiad.wmnet with OS bookworm
  • 15:04 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1438 to wikikube-worker1168
  • 15:03 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 15:03 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1437 to wikikube-worker1167
  • 14:49 tgr@deploy1003: Finished scap sync-world: Backport for Enable EmailAuth enforcement on group 2 for short test (T390662) (duration: 16m 18s)
  • 14:42 tgr@deploy1003: tgr: Continuing with sync
  • 14:42 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic2056* for ban node before reimaging - bking@cumin2002 - T388610
  • 14:42 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2056* for ban node before reimaging - bking@cumin2002 - T388610
  • 14:42 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic2056 for ban node before reimaging - bking@cumin2002 - T388610
  • 14:42 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2056 for ban node before reimaging - bking@cumin2002 - T388610
  • 14:39 tgr@deploy1003: tgr: Backport for Enable EmailAuth enforcement on group 2 for short test (T390662) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:33 tgr@deploy1003: Started scap sync-world: Backport for Enable EmailAuth enforcement on group 2 for short test (T390662)
  • 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test one - bking@cumin2002 - T388610
  • 14:22 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test one - bking@cumin2002 - T388610
  • 14:18 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 14:17 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 14:12 taavi@deploy1003: Finished scap sync-world: re-syncing 1133581 (duration: 08m 58s)
  • 14:05 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1420 to wikikube-worker1166
  • 14:05 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1166
  • 14:03 taavi@deploy1003: Started scap sync-world: re-syncing 1133581
  • 14:03 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 14:03 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 14:02 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1166
  • 14:02 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:02 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1420 to wikikube-worker1166 - hnowlan@cumin1002"
  • 14:02 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1420 to wikikube-worker1166 - hnowlan@cumin1002"
  • 13:57 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2045.codfw.wmnet
  • 13:56 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1044.eqiad.wmnet
  • 13:55 taavi@deploy1003: scap failed: <CalledProcessError> Command '['helmfile', '-e', 'eqiad', '--selector', 'name=main', 'write-values', '--output-file-template', '/tmp/tmp1ws3xaaw']' returned non-zero exit status 1. (scap version: 4.148.0) (duration: 16m 20s)
  • 13:54 taavi@deploy1003: cscott, taavi: Continuing with sync
  • 13:51 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 13:51 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1420 to wikikube-worker1166
  • 13:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1044.eqiad.wmnet
  • 13:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2045.codfw.wmnet
  • 13:46 taavi@deploy1003: cscott, taavi: Backport for Parsoid Fragment Support v3: make mStripExtTags a persistent Parser property (T390420) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:45 moritzm: imported imposm3 0.14.1-1 to apt.wikimedia.org for bookworm-wikimedia T389780 T381565
  • 13:39 taavi@deploy1003: Started scap sync-world: Backport for Parsoid Fragment Support v3: make mStripExtTags a persistent Parser property (T390420)
  • 13:38 taavi: install1004: kill a dead `/usr/bin/apt-mark showmanual` process holding puppet runs
  • 13:34 taavi@deploy1003: scap failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.44.0-wmf.22,1.44.0-wmf.23 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discovery.wmnet/
  • 13:32 taavi@deploy1003: Started scap sync-world: Backport for Parsoid Fragment Support v3: make mStripExtTags a persistent Parser property (T390420)
  • 13:30 taavi@deploy1003: scap failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.44.0-wmf.22,1.44.0-wmf.23 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discovery.wmnet/
  • 13:28 taavi@deploy1003: Started scap sync-world: Backport for Parsoid Fragment Support v3: make mStripExtTags a persistent Parser property (T390420)
  • 13:28 akosiaris@dns1004: END - running authdns-update
  • 13:27 taavi@deploy1003: Finished scap sync-world: Backport for Enable Parsoid Read Views on 13 wiktionaries (T390680), Enable Parsoid Read Views to incubator and dagwiki mobile frontend (T380768 T381002) (duration: 19m 40s)
  • 13:25 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 13:25 akosiaris@dns1004: START - running authdns-update
  • 13:25 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 13:20 taavi@deploy1003: ihurbain, taavi: Continuing with sync
  • 13:17 taavi@deploy1003: ihurbain, taavi: Backport for Enable Parsoid Read Views on 13 wiktionaries (T390680), Enable Parsoid Read Views to incubator and dagwiki mobile frontend (T380768 T381002) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:07 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
  • 13:07 taavi@deploy1003: Started scap sync-world: Backport for Enable Parsoid Read Views on 13 wiktionaries (T390680), Enable Parsoid Read Views to incubator and dagwiki mobile frontend (T380768 T381002)
  • 13:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:05 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-eqiad
  • 13:04 jmm@cumin2002: END (FAIL) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=1) rolling restart_daemons on A:thanos-fe
  • 13:02 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 12:56 moritzm: prune now obsolete nginx packages from testreduce1002 T329529
  • 12:55 godog: move k8s instances from prometheus1006 to prometheus1008 - T383232
  • 12:55 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 12:54 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:53 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
  • 12:53 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 12:48 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 12:47 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:42 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 12:28 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
  • 12:25 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:24 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:22 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-test
  • 12:21 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-test
  • 12:16 moritzm: installing libxslt security updates
  • 11:58 moritzm: installing Intel microcode security updates
  • 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 11:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 11:46 moritzm: installing Django security updates on Bullseye
  • 11:37 moritzm: installing Python 3.9 security updates
  • 11:33 topranks: reboot cr2-eqord to complete JunOS upgrade T364092
  • 11:31 topranks: disable EBGP sessions to internet peers on cr2-eqord to prep for JunOS upgrade T364092
  • 11:30 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr2-codfw,cr2-eqiad,cr2-eqord,cr2-eqord IPv6,cr3-ulsfo with reason: Upgrade cr2-eqord JunOS
  • 11:07 moritzm: installing nodejs security updates
  • 11:06 topranks: pre-pend as paths announced to codfw/eqiad from eqord to prep for JunOS upgrade T364092
  • 11:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 65% (T360589) (duration: 16m 34s)
  • 10:55 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 10:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apus-fe2003.codfw.wmnet with OS bookworm
  • 10:54 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
  • 10:53 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 65% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:51 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
  • 10:50 topranks: drain transport circuits to eqord (Chicago network pop) to prep for Junos upgrade cr2-eqord T364092
  • 10:48 moritzm: remove nodejs from aqs* hosts, no longer used/needed and spares us needless security rollouts T350143
  • 10:46 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 65% (T360589)
  • 10:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apus-fe2003.codfw.wmnet with reason: host reimage
  • 10:27 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on apus-fe2003.codfw.wmnet with reason: host reimage
  • 10:22 akosiaris@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 10:22 akosiaris@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
  • 10:22 akosiaris@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:22 akosiaris@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:21 akosiaris@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:21 akosiaris@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:20 akosiaris@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:20 akosiaris@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:18 akosiaris@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:18 akosiaris@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:17 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:17 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 10:17 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:17 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:16 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:16 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 10:14 akosiaris@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:14 akosiaris@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:10 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host apus-fe2003.codfw.wmnet with OS bookworm
  • 10:02 fabfur@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15 days, 0:00:00 on cp4047.ulsfo.wmnet with reason: HW errors
  • 09:59 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:59 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:59 fabfur: disable puppet on A:cp-eqsin
  • 09:59 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1133850 to use TLS on tmpfs on A:cp-eqsin (T384227)
  • 09:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:54 akosiaris: deploy https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1133745 in all k8s ingresses to stop ingressgateway from forcefully setting the HTTP server header in the responses to "istio-envoy"
  • 09:52 akosiaris@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:52 akosiaris@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:52 godog: lvextend --resizefs --size +1TB vg0/srv on mwlog[12]002
  • 09:52 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:51 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:15 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:15 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3006.esams.wmnet
  • 09:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3006.esams.wmnet
  • 09:03 fabfur: secure deleting certificates in /etc/ssl/private from A:cp-ulsfo (T384227)
  • 09:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3006.esams.wmnet
  • 08:53 fabfur: secure deleting certificates in /etc/ssl/private from A:cp-magru (T384227)
  • 08:48 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@c274545] (releasing): (no justification provided) (duration: 01m 03s)
  • 08:47 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@c274545] (releasing): (no justification provided)
  • 08:46 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@c274545] (releasing): (no justification provided) (duration: 00m 54s)
  • 08:45 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@c274545] (releasing): (no justification provided)
  • 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3006.esams.wmnet
  • 08:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3005.esams.wmnet
  • 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3005.esams.wmnet
  • 08:24 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 08:22 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 08:21 hashar: Upgrading CI Jenkins
  • 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
  • 08:20 slyngshede@dns1004: END - running authdns-update
  • 08:18 slyngshede@dns1004: START - running authdns-update
  • 08:12 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:12 slyngshede@dns1004: START - running authdns-update
  • 08:06 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3005.esams.wmnet
  • 07:54 moritzm: failover ganeti masters in esams to ganeti3007/3008
  • 07:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3008.esams.wmnet
  • 07:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3008.esams.wmnet
  • 07:44 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2044.codfw.wmnet
  • 07:44 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1044.eqiad.wmnet
  • 07:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3008.esams.wmnet
  • 07:38 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2044.codfw.wmnet
  • 07:38 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1044.eqiad.wmnet
  • 07:36 moritzm: added spiderpig-access LDAP group T390338
  • 07:31 fabfur: applying patch to use TLS on tmpfs on A:cp-ulsfo (T384227)
  • 07:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3008.esams.wmnet
  • 07:27 fabfur: disabling puppet on A:cp-ulsfo to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/1133405 (T384227)
  • 07:22 elukey: restart docker on deploy1003 to pick up max-concurrent-uploads=1 - T390251
  • 07:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3007.esams.wmnet
  • 07:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3007.esams.wmnet
  • 07:07 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 07:07 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 07:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3007.esams.wmnet
  • 06:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3007.esams.wmnet
  • 00:39 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore2006
  • 00:16 tstarling@deploy1003: Finished scap sync-world: Backport for Temporarily disable Lua profiler (T389734) (duration: 15m 04s)
  • 00:15 zabe: zabe@mwmaint1002:~$ cat group2.dblist | xargs -I{} bash -c "echo {}; mwscript extensions/AbuseFilter/maintenance/MigrateESRefToAflTable.php {} --deletedump /home/zabe/afl_text_table_deletedump/{} --dump /home/zabe/afl_text_table_dump/{} --sleep 0.4" # T381599
  • 00:09 tstarling@deploy1003: tstarling: Continuing with sync
  • 00:08 tstarling@deploy1003: tstarling: Backport for Temporarily disable Lua profiler (T389734) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 00:01 tstarling@deploy1003: Started scap sync-world: Backport for Temporarily disable Lua profiler (T389734)

2025-04-02

  • 23:32 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore1006
  • 23:28 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore2005
  • 22:38 jhathaway: puppet private repo changes completed, T385995
  • 22:01 brett: Import ncmonitor 1.3.3 into bookworm-wikimedia
  • 22:00 dreamyjazz@deploy1003: Finished scap sync-world: Backport for AbuseLogger: properly distinguish between global filters and central DB (T390904) (duration: 25m 19s)
  • 21:55 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 21:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 21:53 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 21:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 21:53 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 21:53 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 21:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 21:53 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 21:52 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 21:41 dreamyjazz@deploy1003: dreamyjazz: Backport for AbuseLogger: properly distinguish between global filters and central DB (T390904) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:37 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore2004
  • 21:35 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore1005
  • 21:35 dreamyjazz@deploy1003: Started scap sync-world: Backport for AbuseLogger: properly distinguish between global filters and central DB (T390904)
  • 21:31 reedy@deploy1003: Finished scap sync-world: Backport for Enable EmailAuth enforcement on group 0/1 (T390662) (duration: 15m 42s)
  • 21:23 reedy@deploy1003: reedy, tgr: Continuing with sync
  • 21:21 reedy@deploy1003: reedy, tgr: Backport for Enable EmailAuth enforcement on group 0/1 (T390662) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:15 reedy@deploy1003: Started scap sync-world: Backport for Enable EmailAuth enforcement on group 0/1 (T390662)
  • 21:07 reedy@deploy1003: Finished scap sync-world: Backport for SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), Remove redundant WaitConditionLoop from CentralAuthTokenManager, Remove redundant WaitConditionLoop from CentralAuthTokenManager
  • 21:00 reedy@deploy1003: d3r1ck01, matmarex, reedy: Continuing with sync
  • {{safesubst:SAL entry|1=20:52 reedy@deploy1003: d3r1ck01, matmarex, reedy: Backport for SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), Remove redundant WaitConditionLoop from CentralAuthTokenManager, [[gerrit:1133504|Remove redundant WaitConditionLoop from CentralAuthTokenManager]}}
  • 20:47 reedy@deploy1003: Started scap sync-world: Backport for SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), Remove redundant WaitConditionLoop from CentralAuthTokenManager, Remove redundant WaitConditionLoop from CentralAuthTokenManager
  • 20:14 reedy@deploy1003: Started scap sync-world: Backport for SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), Remove redundant WaitConditionLoop from CentralAuthTokenManager, Remove redundant WaitConditionLoop from CentralAuthTokenManager
  • 19:54 jhathaway: rolling out a change to private repo, 1127150, please let me know if any issues arise when merging patches
  • 18:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe2003.codfw.wmnet with OS bookworm
  • 18:35 dancy@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.23 refs T386218
  • 18:35 cstone: SmashPig upgraded from b9310c06 to 642ae816
  • 18:00 reedy@deploy1003: reedy: Continuing with sync
  • {{safesubst:SAL entry|1=18:00 reedy@deploy1003: reedy: Backport for EmailAuth: Allow forceEmailAuth test check without extension dependencies (T390437), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), EmailAuthHooks: Exclude bot users from email auth check (T390662), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), [[gerrit:1133471|EmailA}}
  • 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host apus-fe2003.codfw.wmnet with OS bookworm
  • 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • {{safesubst:SAL entry|1=17:47 reedy@deploy1003: Started scap sync-world: Backport for EmailAuth: Allow forceEmailAuth test check without extension dependencies (T390437), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), EmailAuthHooks: Exclude bot users from email auth check (T390662), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), [[ger}}
  • 17:41 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 17:40 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 17:34 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 17:34 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 17:33 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 17:31 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 17:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 17:30 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 17:30 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 17:27 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 17:27 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 17:25 urandom: starting `nodetool garbagecollect` on sessionstore1004
  • 17:17 urandom: updating Cassandra/sessionstore `gc_grace_seconds` to 259200 (from 864000)
  • 17:13 brett: reloading varnish-frontend on A:cp and not A:cp-text_drmrs and not A:cp-text_codfw
  • 17:08 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on cirrussearch2055.codfw.wmnet with reason: adding net-new role
  • {{safesubst:SAL entry|1=16:52 reedy@deploy1003: Started scap sync-world: Backport for EmailAuth: Allow forceEmailAuth test check without extension dependencies (T390437), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), EmailAuthHooks: Exclude bot users from email auth check (T390662), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), [[ger}}
  • 16:27 vgutierrez: reload varnish on text@codfw to discard stale VCLs - T390846
  • 16:26 swfrench@deploy1003: Finished scap sync-world: Deployment to pick up change in mediawiki-deployments.yaml - T389499 (duration: 03m 21s)
  • 16:25 swfrench@deploy1003: swfrench: Continuing with sync
  • 16:24 swfrench@deploy1003: swfrench: Deployment to pick up change in mediawiki-deployments.yaml - T389499 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:23 vgutierrez: reload varnish on text@drmrs to discard stale VCLs - T390846
  • 16:23 swfrench@deploy1003: Started scap sync-world: Deployment to pick up change in mediawiki-deployments.yaml - T389499
  • 16:10 swfrench-wmf: run-puppet-agent on deploy1003 to pick up mediawiki-deployments.yaml changes - T389499
  • 15:28 arnaudb@dns1004: END - running authdns-update
  • 15:19 arnaudb@dns1004: START - running authdns-update
  • 15:16 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit2002.wikimedia.org with reason: maintenance
  • 15:15 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on gerrit1003.wikimedia.org with reason: maintenance
  • 15:07 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 15:06 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 14:49 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet
  • 14:43 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet
  • 14:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-fe2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe2003.codfw.wmnet with OS bookworm
  • 14:35 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox
  • 14:18 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 14:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1042.eqiad.wmnet
  • 14:13 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1042.eqiad.wmnet
  • 14:12 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:12 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:07 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:06 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:06 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:06 volans@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.8.0 - volans@cumin1002
  • 14:05 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:05 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:04 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:04 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 14:03 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 14:01 volans: upgrading homer to version 0.8.0 to cumin hosts
  • 14:01 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 14:00 volans@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.8.0 - volans@cumin1002
  • 13:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet
  • 13:52 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
  • 13:49 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet
  • 13:49 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 13:43 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet
  • 13:41 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 13:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet
  • 13:40 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet
  • 13:37 akosiaris: depool cp3066 for debugging T390854
  • 13:37 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox
  • 13:35 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet
  • 13:33 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
  • 13:24 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:21 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Configure virtual terms db for wikidata prod & test (T389190), Use wikidata familly in $wgCirrusSearchSimilarityProfile (duration: 16m 55s)
  • 13:19 moritzm: installing gnutls28 security updates on Bookworm
  • 13:14 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 13:14 lucaswerkmeister-wmde@deploy1003: jakob, hashar, lucaswerkmeister-wmde: Continuing with sync
  • 13:14 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 13:11 lucaswerkmeister-wmde@deploy1003: jakob, hashar, lucaswerkmeister-wmde: Backport for Configure virtual terms db for wikidata prod & test (T389190), Use wikidata familly in $wgCirrusSearchSimilarityProfile synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:04 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Configure virtual terms db for wikidata prod & test (T389190), Use wikidata familly in $wgCirrusSearchSimilarityProfile
  • 12:58 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:58 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 12:58 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:57 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 12:57 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:57 jelto@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2003.codfw.wmnet
  • 12:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74582 and previous config saved to /var/cache/conftool/dbconfig/20250402-124139-root.json
  • 12:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cephosd2003.codfw.wmnet
  • 12:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2002.codfw.wmnet
  • 12:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74581 and previous config saved to /var/cache/conftool/dbconfig/20250402-123029-root.json
  • 12:28 jmm@dns1004: END - running authdns-update
  • 12:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cephosd2002.codfw.wmnet
  • 12:26 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74580 and previous config saved to /var/cache/conftool/dbconfig/20250402-122634-root.json
  • 12:26 jmm@dns1004: START - running authdns-update
  • 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2001.codfw.wmnet
  • 12:18 akosiaris@dns1004: END - running authdns-update
  • 12:16 akosiaris@dns1004: START - running authdns-update
  • 12:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74579 and previous config saved to /var/cache/conftool/dbconfig/20250402-121524-root.json
  • 12:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cephosd2001.codfw.wmnet
  • 12:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P74578 and previous config saved to /var/cache/conftool/dbconfig/20250402-121128-root.json
  • 12:11 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on A:cephosd
  • 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1040.eqiad.wmnet
  • 12:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1040.eqiad.wmnet
  • 12:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P74577 and previous config saved to /var/cache/conftool/dbconfig/20250402-120018-root.json
  • 11:56 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74576 and previous config saved to /var/cache/conftool/dbconfig/20250402-115623-root.json
  • 11:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74575 and previous config saved to /var/cache/conftool/dbconfig/20250402-114512-root.json
  • 11:44 fabfur: securely erase certificates from A:cp-magru and provide symlink for acmecerts (T384227)
  • 11:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P74574 and previous config saved to /var/cache/conftool/dbconfig/20250402-114117-root.json
  • 11:40 vgutierrez: restart varnish on cp6016 - T390846
  • 11:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P74573 and previous config saved to /var/cache/conftool/dbconfig/20250402-113007-root.json
  • 11:26 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P74572 and previous config saved to /var/cache/conftool/dbconfig/20250402-112611-root.json
  • 11:22 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 11:22 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 11:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
  • 11:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
  • 11:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
  • 11:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
  • 11:19 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:19 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:18 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:18 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:17 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 11:17 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 11:16 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet
  • 11:16 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
  • 11:16 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:15 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1043.eqiad.wmnet
  • 11:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P74571 and previous config saved to /var/cache/conftool/dbconfig/20250402-111501-root.json
  • 11:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74570 and previous config saved to /var/cache/conftool/dbconfig/20250402-111106-root.json
  • 11:10 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet
  • 11:09 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1043.eqiad.wmnet
  • 11:09 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
  • 11:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 60% (T360589) (duration: 15m 11s)
  • 11:04 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:03 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:03 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:03 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:03 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:03 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:03 akosiaris@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:02 akosiaris@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:01 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 11:00 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on A:cephosd
  • 11:00 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 60% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74569 and previous config saved to /var/cache/conftool/dbconfig/20250402-105956-root.json
  • 10:56 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P74568 and previous config saved to /var/cache/conftool/dbconfig/20250402-105601-root.json
  • 10:53 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 60% (T360589)
  • 10:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P74567 and previous config saved to /var/cache/conftool/dbconfig/20250402-104450-root.json
  • 10:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74566 and previous config saved to /var/cache/conftool/dbconfig/20250402-104055-root.json
  • 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74564 and previous config saved to /var/cache/conftool/dbconfig/20250402-102944-root.json
  • 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P74563 and previous config saved to /var/cache/conftool/dbconfig/20250402-102549-root.json
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1039.eqiad.wmnet
  • 10:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1039.eqiad.wmnet
  • 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
  • 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
  • 10:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P74561 and previous config saved to /var/cache/conftool/dbconfig/20250402-101439-root.json
  • 10:13 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 10:13 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 10:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
  • 10:10 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P74560 and previous config saved to /var/cache/conftool/dbconfig/20250402-101044-root.json
  • 10:10 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 10:09 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 10:09 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 10:09 jelto@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 09:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P74559 and previous config saved to /var/cache/conftool/dbconfig/20250402-095933-root.json
  • 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
  • 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6003.drmrs.wmnet
  • 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
  • 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P74558 and previous config saved to /var/cache/conftool/dbconfig/20250402-095538-root.json
  • 09:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
  • 09:52 marostegui@cumin1002: dbctl commit (dc=all): 'Add db2243 to dbctl depooled T381475', diff saved to https://phabricator.wikimedia.org/P74557 and previous config saved to /var/cache/conftool/dbconfig/20250402-095213-marostegui.json
  • 09:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P74556 and previous config saved to /var/cache/conftool/dbconfig/20250402-094428-root.json
  • 09:41 marostegui@cumin1002: dbctl commit (dc=all): 'Add db1257 to dbctl depooled T381475', diff saved to https://phabricator.wikimedia.org/P74555 and previous config saved to /var/cache/conftool/dbconfig/20250402-094109-marostegui.json
  • 09:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2042.codfw.wmnet
  • 09:40 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1042.eqiad.wmnet
  • 09:40 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
  • 09:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2042.codfw.wmnet
  • 09:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1042.eqiad.wmnet
  • 09:29 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:27 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1041.eqiad.wmnet
  • 09:24 XioNoX: rebooting mr1-ulsfo - T390052
  • 09:24 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1036.eqiad.wmnet
  • 09:23 ayounsi@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mr1-ulsfo with reason: reboot
  • 09:21 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:21 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1041.eqiad.wmnet
  • 09:19 akosiaris@dns1004: END - running authdns-update
  • 09:18 akosiaris: create mw-wikifunctions-ingress.discovery.wmnet and .svc records to facilitate the migration to ingress
  • 09:17 moritzm: failover ganeti masters in drmrs to ganeti6001/6002
  • 09:16 akosiaris@dns1004: START - running authdns-update
  • 09:16 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1036.eqiad.wmnet
  • 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6002.drmrs.wmnet
  • 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
  • 09:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
  • 08:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6002.drmrs.wmnet
  • 08:56 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti6001.drmrs.wmnet
  • 08:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
  • 08:55 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:50 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1036.eqiad.wmnet
  • 08:48 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 08:48 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1036.eqiad.wmnet
  • 08:48 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 08:48 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 08:47 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 08:47 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6001.drmrs.wmnet
  • 08:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
  • 08:47 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:46 akosiaris@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:46 akosiaris@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 08:45 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:45 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
  • 08:41 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:40 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:38 jmm@dns1004: END - running authdns-update
  • 08:38 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:37 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:36 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:36 jmm@dns1004: START - running authdns-update
  • 08:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:32 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
  • 08:32 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:31 XioNoX: trunk sandbox vlan to eqiad row B ganeti - T385560
  • 08:30 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:30 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:28 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:28 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:26 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:23 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:23 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:18 fabfur: repooled cp7001 (T384227)
  • 08:15 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:15 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:57 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:49 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:49 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:47 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs2013.*,lvs1019.*} and A:lvs
  • 07:46 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs2013.*,lvs1019.*} and A:lvs
  • 07:39 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:39 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:36 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs2014.*,lvs1020.*} and A:lvs
  • 07:34 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs2014.*,lvs1020.*} and A:lvs
  • 07:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:29 fabfur: depool cp7001 to fix stale ocsp alert (T384227)
  • 07:19 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:18 jmm@dns1004: END - running authdns-update
  • 07:16 jmm@dns1004: START - running authdns-update
  • 07:02 jmm@dns1004: END - running authdns-update
  • 06:59 jmm@dns1004: START - running authdns-update
  • 06:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2004.codfw.wmnet

2025-04-01

  • 23:43 reedy@deploy1003: rebuilt and synchronized wikiversions files: pihwiki to .23
  • 23:40 ladsgroup@dns1004: END - running authdns-update
  • 23:38 ladsgroup@dns1004: START - running authdns-update
  • 23:34 ladsgroup@dns1004: END - running authdns-update
  • 23:32 ladsgroup@dns1004: START - running authdns-update
  • 23:27 ladsgroup@dns1004: END - running authdns-update
  • 23:25 ladsgroup@dns1004: START - running authdns-update
  • 23:20 ladsgroup@dns1004: END - running authdns-update
  • 23:18 ladsgroup@dns1004: START - running authdns-update
  • 23:03 ladsgroup@dns1004: END - running authdns-update
  • 23:00 ladsgroup@dns1004: START - running authdns-update
  • 22:04 bking@cumin2002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for cirrussearch2055.codfw.wmnet: Renew puppet certificate - bking@cumin2002
  • 21:41 mutante: deploy1003 sudo -u mwdeploy /usr/local/bin/mwscript-cleanup --debug eqiad
  • 20:46 taavi@deploy1003: Finished scap sync-world: Backport for homepage: Add `homepage_transfersize_bytes_total` metric (T382003), homepage: Add `homepage_transfersize_bytes_total` metric (T382003), Don't add WikiLove icon to Minerva (T390642) (duration: 16m 59s)
  • 20:39 taavi@deploy1003: migr, taavi: Continuing with sync
  • 20:37 taavi@deploy1003: migr, taavi: Backport for homepage: Add `homepage_transfersize_bytes_total` metric (T382003), homepage: Add `homepage_transfersize_bytes_total` metric (T382003), Don't add WikiLove icon to Minerva (T390642) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2006.codfw.wmnet with OS bullseye
  • 20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2016.codfw.wmnet with OS bullseye
  • 20:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:30 taavi@deploy1003: Started scap sync-world: Backport for homepage: Add `homepage_transfersize_bytes_total` metric (T382003), homepage: Add `homepage_transfersize_bytes_total` metric (T382003), Don't add WikiLove icon to Minerva (T390642)
  • 20:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2015.codfw.wmnet with OS bullseye
  • 20:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2007.codfw.wmnet with OS bullseye
  • 20:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2005.codfw.wmnet with OS bullseye
  • 20:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:21 taavi@deploy1003: Finished scap sync-world: Backport for [plwiki] Allow bureaucrats to remove users from sysop usergroup (T389829), Close pihwiki (T390732) (duration: 14m 18s)
  • 20:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:14 taavi@deploy1003: superpes, taavi: Continuing with sync
  • 20:13 taavi@deploy1003: superpes, taavi: Backport for [plwiki] Allow bureaucrats to remove users from sysop usergroup (T389829), Close pihwiki (T390732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2006.codfw.wmnet with reason: host reimage
  • 20:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2016.codfw.wmnet with reason: host reimage
  • 20:07 taavi@deploy1003: Started scap sync-world: Backport for [plwiki] Allow bureaucrats to remove users from sysop usergroup (T389829), Close pihwiki (T390732)
  • 20:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2015.codfw.wmnet with reason: host reimage
  • 20:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2007.codfw.wmnet with reason: host reimage
  • 19:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2005.codfw.wmnet with reason: host reimage
  • 19:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2016.codfw.wmnet with reason: host reimage
  • 19:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2007.codfw.wmnet with reason: host reimage
  • 19:55 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2015.codfw.wmnet with reason: host reimage
  • 19:54 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2006.codfw.wmnet with reason: host reimage
  • 19:54 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2005.codfw.wmnet with reason: host reimage
  • 19:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host apus-fe2003.codfw.wmnet with OS bookworm
  • 19:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2016.codfw.wmnet with OS bullseye
  • 19:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe2007.codfw.wmnet with OS bullseye
  • 19:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2015.codfw.wmnet with OS bullseye
  • 19:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe2006.codfw.wmnet with OS bullseye
  • 19:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe2005.codfw.wmnet with OS bullseye
  • 19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['apus-fe2003']
  • 19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe2016']
  • 19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe2015']
  • 19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe2007']
  • 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2007']
  • 19:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe2006']
  • 19:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe2005']
  • 19:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['thanos-fe2007']
  • 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe2015']
  • 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe2016']
  • 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['apus-fe2003']
  • 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2007']
  • 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2006']
  • 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2005']
  • 19:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2016.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-fe2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-fe2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2016.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-fe2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-fe2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-fe2003
  • 19:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host apus-fe2003
  • 19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe2016
  • 19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe2016
  • 19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe2015
  • 19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe2015
  • 19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe2007
  • 19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe2007
  • 19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe2006
  • 19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe2006
  • 19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe2005
  • 19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe2005
  • 19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-fe2005-7, ms-fe2015-6, and apus-fe2003 to codfw - jhancock@cumin2002"
  • 19:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-fe2005-7, ms-fe2015-6, and apus-fe2003 to codfw - jhancock@cumin2002"
  • 19:23 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 18:50 cstone: payments-wiki upgraded from 19b1c505 to e090b97b
  • 18:25 bking@cumin2002: DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for cirrussearch2055.eqiad.wmnet: Renew puppet certificate - bking@cumin2002
  • 18:25 bking@cumin2002: DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for cirrussearch2055.eqiad.wmnet: Renew puppet certificate - bking@cumin2002
  • 18:20 dzahn@dns1004: END - running authdns-update
  • 18:19 mforns@deploy1003: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 18:19 mforns@deploy1003: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 18:17 dzahn@dns1004: START - running authdns-update
  • 18:15 dancy@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.23 refs T386218
  • 18:11 dancy@deploy1003: Testing. Disreagard
  • 17:58 herron@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=k8s-ingress-aux-rw,name=codfw
  • 17:48 herron@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux-rw,name=eqiad
  • 17:48 herron@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux-rw,name=codfw
  • 17:48 herron@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux-ro,name=codfw
  • 17:48 herron@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux-ro,name=eqiad
  • 17:41 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2055.codfw.wmnet with OS bullseye
  • 17:25 brett: importing varnishkafka 1.2.0-1 into bullseye-wikimedia main (T378737)
  • 17:25 brett: importing libvmod-re2/varnish-re2 2.0.0-2~bpo11+wmf2 into bullseye-wikimedia main (T378737)
  • 17:24 brett: importing libvmod-querysort 0.4-3 into bullseye-wikimedia main (T378737)
  • 17:24 brett: importing libvmod-netmapper 1.9.1-1 into bullseye-wikimedia main (T378737)
  • 17:23 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
  • 17:23 brett: importing varnish-modules 0.20.0-2~bpo11 into bullseye-wikimedia main (T378737)
  • 17:23 fabfur: repool cp7001, no certs removed (T384227)
  • 17:22 brett: importing varnish 7.1.1-1.1~bpo11+wmf1 into bullseye-wikimedia main (T378737)
  • 16:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2055
  • 16:23 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2055
  • 16:23 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2055.codfw.wmnet with OS bullseye
  • 16:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 16:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 16:04 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2055.codfw.wmnet with OS bullseye
  • 15:45 topranks: removing et-0/0/0 from ae0 bundle on cr3-ulsfo and cr4-ulsfo T390731
  • 15:27 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on 27 hosts with reason: Maintenance in s2
  • 15:27 dzahn@dns1004: END - running authdns-update
  • 15:25 mutante: DNS - new project language 'nup' - Nupe (also known as Anufe, Nupenci, Nyinfe, and Tapa[3]) is a Volta–Niger language of the Nupoid branch primarily spoken by the Nupe people of the North Central region of Nigeria.
  • 15:24 dzahn@dns1004: START - running authdns-update
  • 15:19 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:18 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:11 brennen@deploy1003: Finished deploy [phabricator/deployment@53fcaf8]: deploy phab1004 for T390737 (duration: 00m 36s)
  • 15:10 brennen@deploy1003: Started deploy [phabricator/deployment@53fcaf8]: deploy phab1004 for T390737
  • 15:09 brennen@deploy1003: Finished deploy [phabricator/deployment@53fcaf8]: test deploy phab2002 for T390737 (duration: 00m 39s)
  • 15:08 brennen@deploy1003: Started deploy [phabricator/deployment@53fcaf8]: test deploy phab2002 for T390737
  • 15:05 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: phabricator deploy
  • 15:04 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: phabricator deploy
  • 14:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:51 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet [reason: finished T390658]
  • 14:50 fabfur: depooled cp7001 to test secure removal of unused certificates (T384227)
  • 14:49 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
  • 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2006.codfw.wmnet with OS bookworm
  • 14:47 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2055
  • 14:47 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2055
  • 14:46 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2055
  • 14:46 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2055.codfw.wmnet 180.0.192.10.in-addr.arpa 0.8.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:46 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2055.codfw.wmnet 180.0.192.10.in-addr.arpa 0.8.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:46 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:46 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2055 - bking@cumin2002"
  • 14:46 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2055 - bking@cumin2002"
  • 14:42 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:41 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:41 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 14:41 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 14:40 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 14:40 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 14:40 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2055
  • 14:40 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2055.codfw.wmnet with OS bullseye
  • 14:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2055 to cirrussearch2055
  • 14:37 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2055
  • 14:37 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2055
  • 14:37 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:37 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2055 to cirrussearch2055 - bking@cumin2002"
  • 14:36 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2055 to cirrussearch2055 - bking@cumin2002"
  • 14:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for CommmonSettings: Remove old BounceHandler DB config (duration: 15m 28s)
  • 14:32 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:31 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2055 to cirrussearch2055
  • 14:28 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 14:27 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2006.codfw.wmnet with reason: host reimage
  • 14:26 ladsgroup@deploy1003: reedy, ladsgroup: Continuing with sync
  • 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T370903)', diff saved to https://phabricator.wikimedia.org/P74547 and previous config saved to /var/cache/conftool/dbconfig/20250401-142516-ladsgroup.json
  • 14:24 ladsgroup@deploy1003: reedy, ladsgroup: Backport for CommmonSettings: Remove old BounceHandler DB config synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2006.codfw.wmnet with reason: host reimage
  • 14:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T370903)', diff saved to https://phabricator.wikimedia.org/P74546 and previous config saved to /var/cache/conftool/dbconfig/20250401-142228-ladsgroup.json
  • 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for CommmonSettings: Remove old BounceHandler DB config
  • 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo and group 1
  • 14:15 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo and group 1
  • 14:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
  • 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P74545 and previous config saved to /var/cache/conftool/dbconfig/20250401-141008-ladsgroup.json
  • 14:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74544 and previous config saved to /var/cache/conftool/dbconfig/20250401-140721-ladsgroup.json
  • 14:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
  • 14:05 elukey: roll restart nginx on registry* to remove debug logging - too much data, filling up the root partition
  • 14:02 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host registry2005.codfw.wmnet
  • 14:00 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2006.codfw.wmnet with OS bookworm
  • 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P74543 and previous config saved to /var/cache/conftool/dbconfig/20250401-135501-ladsgroup.json
  • 13:53 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host registry2005.codfw.wmnet
  • 13:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74542 and previous config saved to /var/cache/conftool/dbconfig/20250401-135215-ladsgroup.json
  • 13:48 elukey: depool registry2005 to investigate some nginx logging issue
  • 13:44 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2035.codfw.wmnet [reason: T390658]
  • 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T370903)', diff saved to https://phabricator.wikimedia.org/P74540 and previous config saved to /var/cache/conftool/dbconfig/20250401-133954-ladsgroup.json
  • 13:39 elukey: restart nginx on registry2005 - stuck writing error logs
  • 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2005.codfw.wmnet with OS bookworm
  • 13:37 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
  • 13:37 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
  • 13:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T370903)', diff saved to https://phabricator.wikimedia.org/P74539 and previous config saved to /var/cache/conftool/dbconfig/20250401-133707-ladsgroup.json
  • 13:35 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
  • 13:35 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
  • 13:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm
  • 13:29 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:28 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Remove 'exception-json' logging channel, Disable experiment-related config during active development (duration: 18m 04s)
  • 13:27 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
  • 13:26 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1040.eqiad.wmnet
  • 13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T370903)', diff saved to https://phabricator.wikimedia.org/P74537 and previous config saved to /var/cache/conftool/dbconfig/20250401-132407-ladsgroup.json
  • 13:24 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 13:21 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, cjming, matmarex: Continuing with sync
  • 13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1160 (T370903)', diff saved to https://phabricator.wikimedia.org/P74536 and previous config saved to /var/cache/conftool/dbconfig/20250401-132059-ladsgroup.json
  • 13:20 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:20 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1040.eqiad.wmnet
  • 13:20 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
  • 13:18 moritzm: installing python-cryptography security updates
  • 13:18 moritzm: installing python-cryptohgraphy security updates
  • 13:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2005.codfw.wmnet with reason: host reimage
  • 13:17 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, cjming, matmarex: Backport for Remove 'exception-json' logging channel, Disable experiment-related config during active development synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T371742)', diff saved to https://phabricator.wikimedia.org/P74534 and previous config saved to /var/cache/conftool/dbconfig/20250401-131530-ladsgroup.json
  • 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage
  • 13:13 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2005.codfw.wmnet with reason: host reimage
  • 13:10 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage
  • 13:10 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Remove 'exception-json' logging channel, Disable experiment-related config during active development
  • 13:05 elukey: restart nginx on registry* to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/1133112 - debug logs to /var/log/nginx/debug.log - T390251
  • 13:04 XioNoX: msw2-eqiad> restart jsd gracefully - T390052
  • 13:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74533 and previous config saved to /var/cache/conftool/dbconfig/20250401-130023-ladsgroup.json
  • 12:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm
  • 12:48 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2005.codfw.wmnet with OS bookworm
  • 12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2004.codfw.wmnet with OS bookworm
  • 12:47 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
  • 12:47 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
  • 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74530 and previous config saved to /var/cache/conftool/dbconfig/20250401-124516-ladsgroup.json
  • 12:44 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
  • 12:44 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
  • 12:43 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
  • 12:43 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
  • 12:42 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
  • 12:42 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
  • 12:42 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.zarcillo (exit_code=99)
  • 12:41 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
  • 12:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2040.codfw.wmnet
  • 12:41 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.zarcillo (exit_code=99)
  • 12:40 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1039.eqiad.wmnet
  • 12:39 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
  • 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti4008.ulsfo.wmnet
  • 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
  • 12:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2040.codfw.wmnet
  • 12:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
  • 12:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T371742)', diff saved to https://phabricator.wikimedia.org/P74529 and previous config saved to /var/cache/conftool/dbconfig/20250401-123009-ladsgroup.json
  • 12:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
  • 12:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2004.codfw.wmnet with reason: host reimage
  • 12:24 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2004.codfw.wmnet with reason: host reimage
  • 12:23 moritzm: installing PHP 7.4 security updates (as shipped in Debian, not our internal build running on a few remaining edge cases)
  • 12:12 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 12:11 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 12:11 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 12:11 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 12:08 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1038.eqiad.wmnet
  • 12:08 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 12:08 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2039.codfw.wmnet
  • 12:08 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 12:04 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2004.codfw.wmnet with OS bookworm
  • 12:02 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 12:02 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1038.eqiad.wmnet
  • 12:02 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 12:02 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2039.codfw.wmnet
  • 11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T371742)', diff saved to https://phabricator.wikimedia.org/P74528 and previous config saved to /var/cache/conftool/dbconfig/20250401-115935-ladsgroup.json
  • 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2003.codfw.wmnet with OS bookworm
  • 11:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P74527 and previous config saved to /var/cache/conftool/dbconfig/20250401-114428-ladsgroup.json
  • 11:34 Lucas_WMDE: Deployed patch for T389369
  • 11:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2003.codfw.wmnet with reason: host reimage
  • 11:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P74526 and previous config saved to /var/cache/conftool/dbconfig/20250401-112921-ladsgroup.json
  • 11:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2003.codfw.wmnet with reason: host reimage
  • 11:26 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2038.codfw.wmnet
  • 11:25 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1037.eqiad.wmnet
  • 11:24 moritzm: installing squid security updates
  • 11:22 hashar: Restarting Gerrit
  • 11:19 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2038.codfw.wmnet
  • 11:18 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1037.eqiad.wmnet
  • 11:16 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti4008.ulsfo.wmnet
  • 11:16 topranks: reboot cr4-ulsfo to upgrade JunOS T364092
  • 11:15 hashar: Restarted Gerrit replica on gerrit2002 to raise heap from 32G to 64G | T387223
  • 11:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T371742)', diff saved to https://phabricator.wikimedia.org/P74525 and previous config saved to /var/cache/conftool/dbconfig/20250401-111415-ladsgroup.json
  • 11:13 volans@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on sretest1002.eqiad.wmnet with reason: Test
  • 11:12 moritzm: restarting FPM on phab1004 to pick up security update
  • 11:10 volans: upgrading spicerack to v10.0.0 on cumin1002
  • 11:10 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: Upgrade cr4-ulsfo JunOS
  • 11:06 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti4008.ulsfo.wmnet with reason: remove from cluster for reimage
  • 11:06 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2003.codfw.wmnet with OS bookworm
  • 11:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
  • 11:05 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2006.codfw.wmnet
  • 11:04 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 11:04 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2002.codfw.wmnet with OS bookworm
  • 11:02 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
  • 10:58 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 55% (T360589) (duration: 22m 03s)
  • 10:58 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2006.codfw.wmnet
  • 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 10:56 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2005.codfw.wmnet
  • 10:56 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1006.eqiad.wmnet
  • 10:55 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1211.eqiad.wmnet onto db1257.eqiad.wmnet
  • 10:55 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1211 slowly with 10 steps - Pool db1211.eqiad.wmnet in after cloning
  • 10:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T371742)', diff saved to https://phabricator.wikimedia.org/P74523 and previous config saved to /var/cache/conftool/dbconfig/20250401-105425-ladsgroup.json
  • 10:54 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 10:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2005.codfw.wmnet
  • 10:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1006.eqiad.wmnet
  • 10:48 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 10:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T371742)', diff saved to https://phabricator.wikimedia.org/P74522 and previous config saved to /var/cache/conftool/dbconfig/20250401-104659-ladsgroup.json
  • 10:46 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 10:46 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 55% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:45 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
  • 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-test
  • 10:43 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-test
  • 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
  • 10:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
  • 10:36 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 55% (T360589)
  • 10:33 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2004.codfw.wmnet
  • 10:33 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1005.eqiad.wmnet
  • 10:27 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet
  • 10:26 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1005.eqiad.wmnet
  • 10:25 akosiaris@deploy1003: Finished scap sync-world: Backport for typos: Add wnmet as a typo (duration: 29m 34s)
  • 10:24 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1004.eqiad.wmnet
  • 10:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2002.codfw.wmnet with OS bookworm
  • 10:19 jiji@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc-gp2004.codfw.wmnet
  • 10:19 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet
  • 10:19 aqu@deploy1003: Finished deploy [airflow-dags/analytics@d96f732]: Update artifacts for analytics (duration: 00m 59s)
  • 10:18 aqu@deploy1003: Started deploy [airflow-dags/analytics@d96f732]: Update artifacts for analytics
  • 10:17 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1004.eqiad.wmnet
  • 10:17 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@d96f732]: Update artifacts for analytics_test (duration: 00m 12s)
  • 10:17 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@d96f732]: Update artifacts for analytics_test
  • 10:17 jiji@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc-gp1004.eqiad.wmnet
  • 10:16 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1004.eqiad.wmnet
  • 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2001.codfw.wmnet with OS bookworm
  • 10:09 akosiaris@deploy1003: akosiaris: Continuing with sync
  • 10:08 akosiaris@deploy1003: akosiaris: Backport for typos: Add wnmet as a typo synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:00 joal@deploy1003: Finished deploy [analytics/refinery@efc4808] (hadoop-test): Analytics webrequest migration TEST [analytics/refinery@efc48089] (duration: 00m 40s)
  • 09:59 joal@deploy1003: Started deploy [analytics/refinery@efc4808] (hadoop-test): Analytics webrequest migration TEST [analytics/refinery@efc48089]
  • 09:59 joal@deploy1003: Finished deploy [analytics/refinery@efc4808] (thin): Analytics webrequest migration THIN [analytics/refinery@efc48089] (duration: 00m 55s)
  • 09:58 joal@deploy1003: Started deploy [analytics/refinery@efc4808] (thin): Analytics webrequest migration THIN [analytics/refinery@efc48089]
  • 09:57 joal@deploy1003: Finished deploy [analytics/refinery@efc4808]: Analytics webrequest migration [analytics/refinery@efc48089] (duration: 02m 24s)
  • 09:57 moritzm: installing freetype security updates
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage
  • 09:55 akosiaris@deploy1003: Started scap sync-world: Backport for typos: Add wnmet as a typo
  • 09:55 akosiaris: scap backport a noop change https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1133069 for T390251
  • 09:55 joal@deploy1003: Started deploy [analytics/refinery@efc4808]: Analytics webrequest migration [analytics/refinery@efc48089]
  • 09:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage
  • 09:50 elukey: restart nginx on registry* to pick up the debug changes
  • 09:42 volans@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on sretest1001.eqiad.wmnet with reason: test
  • 09:39 gmodena@deploy1003: Finished deploy [airflow-dags/search@ed0fc78]: Deploy mjolnir-2.7.0.dev.conda.tgz (duration: 01m 29s)
  • 09:38 gmodena@deploy1003: Started deploy [airflow-dags/search@ed0fc78]: Deploy mjolnir-2.7.0.dev.conda.tgz
  • 09:32 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2001.codfw.wmnet with OS bookworm
  • 09:27 ayounsi@cumin1002: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device mr1-ulsfo
  • 09:26 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-ulsfo
  • 09:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:19 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
  • 08:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
  • 08:58 dcausse@deploy1003: Finished deploy [wdqs/wdqs@354b5ac]: revert T326311, deletion query way too slow (duration: 12m 15s)
  • 08:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
  • 08:50 hashar@deploy1003: Finished deploy [integration/docroot@5256e19]: build: Updating eslint-config-wikimedia to 0.29.1 (duration: 00m 09s)
  • 08:50 hashar@deploy1003: Started deploy [integration/docroot@5256e19]: build: Updating eslint-config-wikimedia to 0.29.1
  • 08:46 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device msw1-eqiad
  • 08:46 topranks: Drain Lumen cct from codfw to ulsfo due to instability T390660
  • 08:46 dcausse@deploy1003: Started deploy [wdqs/wdqs@354b5ac]: revert T326311, deletion query way too slow
  • 08:45 volans: upgrading spicerack to v10.0.0 on cumin2002
  • 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device msw1-eqiad
  • 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device msw2-eqiad
  • 08:38 marostegui@cumin1002: START - Cookbook sre.mysql.pool db1211 slowly with 10 steps - Pool db1211.eqiad.wmnet in after cloning
  • 08:36 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device msw2-eqiad
  • 08:36 moritzm: failover ganeti master in ulsfo to ganeti4005 T382511
  • 08:35 volans: temporary disable puppet on cumin1002 for the spicerack upgrade to v10.0.0
  • 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device msw1-codfw
  • 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti4007
  • 08:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti4007
  • 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
  • 08:32 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device msw1-codfw
  • 08:29 elukey: set debug logging for registry*'s nginx - T390251
  • 08:29 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device msw2-codfw
  • 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
  • 08:27 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device msw2-codfw
  • 08:24 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-eqiad
  • 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
  • 08:18 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-eqiad
  • 08:18 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-eqsin
  • 08:17 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 08:16 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 08:14 dcausse: T390665: restart blazegraph on wdqs2017
  • 08:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
  • 08:12 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 08:12 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 08:11 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-eqsin
  • 08:11 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 08:11 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 08:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-esams
  • 08:05 dcausse: restarting blazegraph on wdqs2016
  • 08:04 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 08:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4007.ulsfo.wmnet with OS bookworm
  • 08:00 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 07:59 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 07:59 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-esams
  • 07:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-drmrs
  • 07:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-drmrs
  • 07:50 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-magru
  • 07:47 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 07:46 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 07:44 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-magru
  • 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4007.ulsfo.wmnet with reason: host reimage
  • 07:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-codfw
  • 07:37 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4007.ulsfo.wmnet with reason: host reimage
  • 07:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-codfw
  • 07:34 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 07:31 ayounsi@cumin1002: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device mr1-ulsfo
  • 07:30 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 07:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-ulsfo
  • 07:28 ayounsi@cumin1002: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device mr1-ulsfo
  • 07:28 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-ulsfo
  • 07:26 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-ulsfo
  • 07:24 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 07:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4007.ulsfo.wmnet with OS bookworm
  • 07:19 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-ulsfo
  • 07:19 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1b-eqiad
  • 07:17 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1b-eqiad
  • 06:14 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1211.eqiad.wmnet onto db1257.eqiad.wmnet
  • 05:33 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@557a834]: 0.3.155 (duration: 12m 49s)
  • 05:22 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.155` on canary `wdqs1015`; proceeding to rest of fleet
  • 05:20 ryankemper@deploy1003: Started deploy [wdqs/wdqs@557a834]: 0.3.155
  • 05:14 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.155`. Pre-deploy tests passing on canary `wdqs1016`
  • 04:04 mwpresync@deploy1003: Pruned MediaWiki: 1.44.0-wmf.20 (duration: 04m 34s)
  • 03:02 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.44.0-wmf.23 refs T386218


Archives

See Server Admin Log/Archives.