Jump to content

Server Admin Log/Archive 84

From Wikitech

2024-08-31

  • 15:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T371742)', diff saved to https://phabricator.wikimedia.org/P68498 and previous config saved to /var/cache/conftool/dbconfig/20240831-155331-ladsgroup.json
  • 15:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P68497 and previous config saved to /var/cache/conftool/dbconfig/20240831-153824-ladsgroup.json
  • 15:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T370903)', diff saved to https://phabricator.wikimedia.org/P68496 and previous config saved to /var/cache/conftool/dbconfig/20240831-153309-ladsgroup.json
  • 15:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P68495 and previous config saved to /var/cache/conftool/dbconfig/20240831-152317-ladsgroup.json
  • 15:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P68494 and previous config saved to /var/cache/conftool/dbconfig/20240831-151802-ladsgroup.json
  • 15:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T371742)', diff saved to https://phabricator.wikimedia.org/P68493 and previous config saved to /var/cache/conftool/dbconfig/20240831-150810-ladsgroup.json
  • 15:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P68492 and previous config saved to /var/cache/conftool/dbconfig/20240831-150254-ladsgroup.json
  • 14:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T367856)', diff saved to https://phabricator.wikimedia.org/P68491 and previous config saved to /var/cache/conftool/dbconfig/20240831-145733-marostegui.json
  • 14:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 7:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 14:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 7:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 14:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T367856)', diff saved to https://phabricator.wikimedia.org/P68490 and previous config saved to /var/cache/conftool/dbconfig/20240831-145712-marostegui.json
  • 14:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T370903)', diff saved to https://phabricator.wikimedia.org/P68489 and previous config saved to /var/cache/conftool/dbconfig/20240831-144748-ladsgroup.json
  • 14:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P68488 and previous config saved to /var/cache/conftool/dbconfig/20240831-144204-marostegui.json
  • 14:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T370903)', diff saved to https://phabricator.wikimedia.org/P68487 and previous config saved to /var/cache/conftool/dbconfig/20240831-143348-ladsgroup.json
  • 14:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 14:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 14:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T370903)', diff saved to https://phabricator.wikimedia.org/P68486 and previous config saved to /var/cache/conftool/dbconfig/20240831-143326-ladsgroup.json
  • 14:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P68485 and previous config saved to /var/cache/conftool/dbconfig/20240831-142657-marostegui.json
  • 14:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P68484 and previous config saved to /var/cache/conftool/dbconfig/20240831-141819-ladsgroup.json
  • 14:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T367856)', diff saved to https://phabricator.wikimedia.org/P68483 and previous config saved to /var/cache/conftool/dbconfig/20240831-141150-marostegui.json
  • 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2218 (T371742)', diff saved to https://phabricator.wikimedia.org/P68482 and previous config saved to /var/cache/conftool/dbconfig/20240831-141011-ladsgroup.json
  • 14:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 14:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T371742)', diff saved to https://phabricator.wikimedia.org/P68481 and previous config saved to /var/cache/conftool/dbconfig/20240831-140949-ladsgroup.json
  • 14:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P68480 and previous config saved to /var/cache/conftool/dbconfig/20240831-140311-ladsgroup.json
  • 13:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P68479 and previous config saved to /var/cache/conftool/dbconfig/20240831-135442-ladsgroup.json
  • 13:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T370903)', diff saved to https://phabricator.wikimedia.org/P68478 and previous config saved to /var/cache/conftool/dbconfig/20240831-134804-ladsgroup.json
  • 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P68477 and previous config saved to /var/cache/conftool/dbconfig/20240831-133935-ladsgroup.json
  • 13:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T370903)', diff saved to https://phabricator.wikimedia.org/P68476 and previous config saved to /var/cache/conftool/dbconfig/20240831-133349-ladsgroup.json
  • 13:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 13:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T371742)', diff saved to https://phabricator.wikimedia.org/P68475 and previous config saved to /var/cache/conftool/dbconfig/20240831-132428-ladsgroup.json
  • 13:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T370903)', diff saved to https://phabricator.wikimedia.org/P68474 and previous config saved to /var/cache/conftool/dbconfig/20240831-131907-ladsgroup.json
  • 13:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P68473 and previous config saved to /var/cache/conftool/dbconfig/20240831-130400-ladsgroup.json
  • 12:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P68472 and previous config saved to /var/cache/conftool/dbconfig/20240831-124853-ladsgroup.json
  • 12:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T370903)', diff saved to https://phabricator.wikimedia.org/P68471 and previous config saved to /var/cache/conftool/dbconfig/20240831-123346-ladsgroup.json
  • 12:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T371742)', diff saved to https://phabricator.wikimedia.org/P68470 and previous config saved to /var/cache/conftool/dbconfig/20240831-122900-ladsgroup.json
  • 12:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 12:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 12:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T370903)', diff saved to https://phabricator.wikimedia.org/P68469 and previous config saved to /var/cache/conftool/dbconfig/20240831-121937-ladsgroup.json
  • 12:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 12:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 12:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T370903)', diff saved to https://phabricator.wikimedia.org/P68468 and previous config saved to /var/cache/conftool/dbconfig/20240831-121915-ladsgroup.json
  • 12:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P68467 and previous config saved to /var/cache/conftool/dbconfig/20240831-120409-ladsgroup.json
  • 11:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P68466 and previous config saved to /var/cache/conftool/dbconfig/20240831-114902-ladsgroup.json
  • 11:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 11:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T370903)', diff saved to https://phabricator.wikimedia.org/P68465 and previous config saved to /var/cache/conftool/dbconfig/20240831-113355-ladsgroup.json
  • 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T370903)', diff saved to https://phabricator.wikimedia.org/P68464 and previous config saved to /var/cache/conftool/dbconfig/20240831-111528-ladsgroup.json
  • 11:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 11:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T370903)', diff saved to https://phabricator.wikimedia.org/P68463 and previous config saved to /var/cache/conftool/dbconfig/20240831-111506-ladsgroup.json
  • 10:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P68462 and previous config saved to /var/cache/conftool/dbconfig/20240831-105959-ladsgroup.json
  • 10:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 10:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 10:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T371742)', diff saved to https://phabricator.wikimedia.org/P68461 and previous config saved to /var/cache/conftool/dbconfig/20240831-104829-ladsgroup.json
  • 10:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P68460 and previous config saved to /var/cache/conftool/dbconfig/20240831-104452-ladsgroup.json
  • 10:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P68459 and previous config saved to /var/cache/conftool/dbconfig/20240831-103322-ladsgroup.json
  • 10:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T370903)', diff saved to https://phabricator.wikimedia.org/P68458 and previous config saved to /var/cache/conftool/dbconfig/20240831-102944-ladsgroup.json
  • 10:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P68457 and previous config saved to /var/cache/conftool/dbconfig/20240831-101815-ladsgroup.json
  • 10:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T370903)', diff saved to https://phabricator.wikimedia.org/P68456 and previous config saved to /var/cache/conftool/dbconfig/20240831-101131-ladsgroup.json
  • 10:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 10:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 10:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T370903)', diff saved to https://phabricator.wikimedia.org/P68455 and previous config saved to /var/cache/conftool/dbconfig/20240831-101109-ladsgroup.json
  • 10:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T371742)', diff saved to https://phabricator.wikimedia.org/P68454 and previous config saved to /var/cache/conftool/dbconfig/20240831-100308-ladsgroup.json
  • 09:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P68453 and previous config saved to /var/cache/conftool/dbconfig/20240831-095602-ladsgroup.json
  • 09:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P68452 and previous config saved to /var/cache/conftool/dbconfig/20240831-094055-ladsgroup.json
  • 09:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T370903)', diff saved to https://phabricator.wikimedia.org/P68451 and previous config saved to /var/cache/conftool/dbconfig/20240831-092548-ladsgroup.json
  • 09:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T370903)', diff saved to https://phabricator.wikimedia.org/P68450 and previous config saved to /var/cache/conftool/dbconfig/20240831-090843-ladsgroup.json
  • 09:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 09:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 09:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 09:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 09:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T370903)', diff saved to https://phabricator.wikimedia.org/P68449 and previous config saved to /var/cache/conftool/dbconfig/20240831-090817-ladsgroup.json
  • 09:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T371742)', diff saved to https://phabricator.wikimedia.org/P68448 and previous config saved to /var/cache/conftool/dbconfig/20240831-090155-ladsgroup.json
  • 09:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 09:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 09:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T371742)', diff saved to https://phabricator.wikimedia.org/P68447 and previous config saved to /var/cache/conftool/dbconfig/20240831-090133-ladsgroup.json
  • 08:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P68446 and previous config saved to /var/cache/conftool/dbconfig/20240831-085310-ladsgroup.json
  • 08:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P68445 and previous config saved to /var/cache/conftool/dbconfig/20240831-084626-ladsgroup.json
  • 08:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P68444 and previous config saved to /var/cache/conftool/dbconfig/20240831-083803-ladsgroup.json
  • 08:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P68443 and previous config saved to /var/cache/conftool/dbconfig/20240831-083118-ladsgroup.json
  • 08:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T370903)', diff saved to https://phabricator.wikimedia.org/P68442 and previous config saved to /var/cache/conftool/dbconfig/20240831-082256-ladsgroup.json
  • 08:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T371742)', diff saved to https://phabricator.wikimedia.org/P68441 and previous config saved to /var/cache/conftool/dbconfig/20240831-081611-ladsgroup.json
  • 08:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T370903)', diff saved to https://phabricator.wikimedia.org/P68440 and previous config saved to /var/cache/conftool/dbconfig/20240831-080733-ladsgroup.json
  • 08:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 08:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 08:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T370903)', diff saved to https://phabricator.wikimedia.org/P68439 and previous config saved to /var/cache/conftool/dbconfig/20240831-080700-ladsgroup.json
  • 07:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P68438 and previous config saved to /var/cache/conftool/dbconfig/20240831-075152-ladsgroup.json
  • 07:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P68437 and previous config saved to /var/cache/conftool/dbconfig/20240831-073645-ladsgroup.json
  • 07:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T370903)', diff saved to https://phabricator.wikimedia.org/P68436 and previous config saved to /var/cache/conftool/dbconfig/20240831-072138-ladsgroup.json
  • 07:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T371742)', diff saved to https://phabricator.wikimedia.org/P68435 and previous config saved to /var/cache/conftool/dbconfig/20240831-071243-ladsgroup.json
  • 07:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 07:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 07:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T371742)', diff saved to https://phabricator.wikimedia.org/P68434 and previous config saved to /var/cache/conftool/dbconfig/20240831-071221-ladsgroup.json
  • 07:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T370903)', diff saved to https://phabricator.wikimedia.org/P68433 and previous config saved to /var/cache/conftool/dbconfig/20240831-070333-ladsgroup.json
  • 07:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 07:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 07:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T370903)', diff saved to https://phabricator.wikimedia.org/P68432 and previous config saved to /var/cache/conftool/dbconfig/20240831-070311-ladsgroup.json
  • 06:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P68431 and previous config saved to /var/cache/conftool/dbconfig/20240831-065714-ladsgroup.json
  • 06:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P68430 and previous config saved to /var/cache/conftool/dbconfig/20240831-064803-ladsgroup.json
  • 06:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P68429 and previous config saved to /var/cache/conftool/dbconfig/20240831-064207-ladsgroup.json
  • 06:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P68428 and previous config saved to /var/cache/conftool/dbconfig/20240831-063256-ladsgroup.json
  • 06:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T371742)', diff saved to https://phabricator.wikimedia.org/P68427 and previous config saved to /var/cache/conftool/dbconfig/20240831-062659-ladsgroup.json
  • 06:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T370903)', diff saved to https://phabricator.wikimedia.org/P68426 and previous config saved to /var/cache/conftool/dbconfig/20240831-061749-ladsgroup.json
  • 05:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T370903)', diff saved to https://phabricator.wikimedia.org/P68425 and previous config saved to /var/cache/conftool/dbconfig/20240831-055741-ladsgroup.json
  • 05:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 05:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 05:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T370903)', diff saved to https://phabricator.wikimedia.org/P68424 and previous config saved to /var/cache/conftool/dbconfig/20240831-055719-ladsgroup.json
  • 05:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P68423 and previous config saved to /var/cache/conftool/dbconfig/20240831-054211-ladsgroup.json
  • 05:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P68422 and previous config saved to /var/cache/conftool/dbconfig/20240831-052704-ladsgroup.json
  • 05:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T371742)', diff saved to https://phabricator.wikimedia.org/P68421 and previous config saved to /var/cache/conftool/dbconfig/20240831-052543-ladsgroup.json
  • 05:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 05:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 05:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 05:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 05:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T371742)', diff saved to https://phabricator.wikimedia.org/P68420 and previous config saved to /var/cache/conftool/dbconfig/20240831-052516-ladsgroup.json
  • 05:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T370903)', diff saved to https://phabricator.wikimedia.org/P68419 and previous config saved to /var/cache/conftool/dbconfig/20240831-051157-ladsgroup.json
  • 05:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P68418 and previous config saved to /var/cache/conftool/dbconfig/20240831-051009-ladsgroup.json
  • 04:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P68417 and previous config saved to /var/cache/conftool/dbconfig/20240831-045501-ladsgroup.json
  • 04:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T370903)', diff saved to https://phabricator.wikimedia.org/P68416 and previous config saved to /var/cache/conftool/dbconfig/20240831-045435-ladsgroup.json
  • 04:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 04:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 04:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T371742)', diff saved to https://phabricator.wikimedia.org/P68415 and previous config saved to /var/cache/conftool/dbconfig/20240831-043954-ladsgroup.json
  • 04:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 04:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 04:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T370903)', diff saved to https://phabricator.wikimedia.org/P68414 and previous config saved to /var/cache/conftool/dbconfig/20240831-043621-ladsgroup.json
  • 04:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P68413 and previous config saved to /var/cache/conftool/dbconfig/20240831-042114-ladsgroup.json
  • 04:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P68412 and previous config saved to /var/cache/conftool/dbconfig/20240831-040607-ladsgroup.json
  • 03:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T370903)', diff saved to https://phabricator.wikimedia.org/P68411 and previous config saved to /var/cache/conftool/dbconfig/20240831-035100-ladsgroup.json
  • 03:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T371742)', diff saved to https://phabricator.wikimedia.org/P68410 and previous config saved to /var/cache/conftool/dbconfig/20240831-033831-ladsgroup.json
  • 03:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 03:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 03:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T371742)', diff saved to https://phabricator.wikimedia.org/P68409 and previous config saved to /var/cache/conftool/dbconfig/20240831-033809-ladsgroup.json
  • 03:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T370903)', diff saved to https://phabricator.wikimedia.org/P68408 and previous config saved to /var/cache/conftool/dbconfig/20240831-033310-ladsgroup.json
  • 03:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 03:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 03:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T370903)', diff saved to https://phabricator.wikimedia.org/P68407 and previous config saved to /var/cache/conftool/dbconfig/20240831-033248-ladsgroup.json
  • 03:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P68406 and previous config saved to /var/cache/conftool/dbconfig/20240831-032302-ladsgroup.json
  • 03:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P68405 and previous config saved to /var/cache/conftool/dbconfig/20240831-031741-ladsgroup.json
  • 03:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P68404 and previous config saved to /var/cache/conftool/dbconfig/20240831-030755-ladsgroup.json
  • 03:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P68403 and previous config saved to /var/cache/conftool/dbconfig/20240831-030234-ladsgroup.json
  • 02:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T371742)', diff saved to https://phabricator.wikimedia.org/P68402 and previous config saved to /var/cache/conftool/dbconfig/20240831-025248-ladsgroup.json
  • 02:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T370903)', diff saved to https://phabricator.wikimedia.org/P68401 and previous config saved to /var/cache/conftool/dbconfig/20240831-024727-ladsgroup.json
  • 02:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T370903)', diff saved to https://phabricator.wikimedia.org/P68400 and previous config saved to /var/cache/conftool/dbconfig/20240831-022822-ladsgroup.json
  • 02:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 02:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 02:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 02:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 01:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 01:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 01:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T371742)', diff saved to https://phabricator.wikimedia.org/P68399 and previous config saved to /var/cache/conftool/dbconfig/20240831-015132-ladsgroup.json
  • 01:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 01:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 01:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T371742)', diff saved to https://phabricator.wikimedia.org/P68398 and previous config saved to /var/cache/conftool/dbconfig/20240831-015110-ladsgroup.json
  • 01:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P68397 and previous config saved to /var/cache/conftool/dbconfig/20240831-013603-ladsgroup.json
  • 01:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 01:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 01:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T370903)', diff saved to https://phabricator.wikimedia.org/P68396 and previous config saved to /var/cache/conftool/dbconfig/20240831-013254-ladsgroup.json
  • 01:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P68395 and previous config saved to /var/cache/conftool/dbconfig/20240831-012055-ladsgroup.json
  • 01:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P68394 and previous config saved to /var/cache/conftool/dbconfig/20240831-011746-ladsgroup.json
  • 01:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T371742)', diff saved to https://phabricator.wikimedia.org/P68393 and previous config saved to /var/cache/conftool/dbconfig/20240831-010548-ladsgroup.json
  • 01:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P68392 and previous config saved to /var/cache/conftool/dbconfig/20240831-010239-ladsgroup.json
  • 00:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T370903)', diff saved to https://phabricator.wikimedia.org/P68391 and previous config saved to /var/cache/conftool/dbconfig/20240831-004732-ladsgroup.json
  • 00:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T370903)', diff saved to https://phabricator.wikimedia.org/P68390 and previous config saved to /var/cache/conftool/dbconfig/20240831-002842-ladsgroup.json
  • 00:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 00:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 00:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T370903)', diff saved to https://phabricator.wikimedia.org/P68389 and previous config saved to /var/cache/conftool/dbconfig/20240831-002819-ladsgroup.json
  • 00:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P68388 and previous config saved to /var/cache/conftool/dbconfig/20240831-001312-ladsgroup.json
  • 00:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2121 (T371742)', diff saved to https://phabricator.wikimedia.org/P68387 and previous config saved to /var/cache/conftool/dbconfig/20240831-000400-ladsgroup.json
  • 00:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance

2024-08-30

  • 23:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P68386 and previous config saved to /var/cache/conftool/dbconfig/20240830-235804-ladsgroup.json
  • 23:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 23:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 23:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T371742)', diff saved to https://phabricator.wikimedia.org/P68385 and previous config saved to /var/cache/conftool/dbconfig/20240830-234621-ladsgroup.json
  • 23:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T370903)', diff saved to https://phabricator.wikimedia.org/P68384 and previous config saved to /var/cache/conftool/dbconfig/20240830-234257-ladsgroup.json
  • 23:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P68383 and previous config saved to /var/cache/conftool/dbconfig/20240830-233113-ladsgroup.json
  • 23:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P68382 and previous config saved to /var/cache/conftool/dbconfig/20240830-231606-ladsgroup.json
  • 23:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T371742)', diff saved to https://phabricator.wikimedia.org/P68381 and previous config saved to /var/cache/conftool/dbconfig/20240830-230059-ladsgroup.json
  • 22:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T370903)', diff saved to https://phabricator.wikimedia.org/P68380 and previous config saved to /var/cache/conftool/dbconfig/20240830-225902-ladsgroup.json
  • 22:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 22:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 22:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T370903)', diff saved to https://phabricator.wikimedia.org/P68379 and previous config saved to /var/cache/conftool/dbconfig/20240830-225840-ladsgroup.json
  • 22:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P68378 and previous config saved to /var/cache/conftool/dbconfig/20240830-224333-ladsgroup.json
  • 22:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P68377 and previous config saved to /var/cache/conftool/dbconfig/20240830-222826-ladsgroup.json
  • 22:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T370903)', diff saved to https://phabricator.wikimedia.org/P68376 and previous config saved to /var/cache/conftool/dbconfig/20240830-221319-ladsgroup.json
  • 21:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T370903)', diff saved to https://phabricator.wikimedia.org/P68375 and previous config saved to /var/cache/conftool/dbconfig/20240830-215611-ladsgroup.json
  • 21:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 21:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 21:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T370903)', diff saved to https://phabricator.wikimedia.org/P68374 and previous config saved to /var/cache/conftool/dbconfig/20240830-215549-ladsgroup.json
  • 21:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T371742)', diff saved to https://phabricator.wikimedia.org/P68373 and previous config saved to /var/cache/conftool/dbconfig/20240830-214558-ladsgroup.json
  • 21:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 21:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T371742)', diff saved to https://phabricator.wikimedia.org/P68372 and previous config saved to /var/cache/conftool/dbconfig/20240830-214536-ladsgroup.json
  • 21:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P68371 and previous config saved to /var/cache/conftool/dbconfig/20240830-214042-ladsgroup.json
  • 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P68370 and previous config saved to /var/cache/conftool/dbconfig/20240830-213028-ladsgroup.json
  • 21:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P68369 and previous config saved to /var/cache/conftool/dbconfig/20240830-212535-ladsgroup.json
  • 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P68368 and previous config saved to /var/cache/conftool/dbconfig/20240830-211521-ladsgroup.json
  • 21:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T370903)', diff saved to https://phabricator.wikimedia.org/P68367 and previous config saved to /var/cache/conftool/dbconfig/20240830-211028-ladsgroup.json
  • 21:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T371742)', diff saved to https://phabricator.wikimedia.org/P68366 and previous config saved to /var/cache/conftool/dbconfig/20240830-210014-ladsgroup.json
  • 20:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T370903)', diff saved to https://phabricator.wikimedia.org/P68365 and previous config saved to /var/cache/conftool/dbconfig/20240830-201956-ladsgroup.json
  • 20:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 20:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 20:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T370903)', diff saved to https://phabricator.wikimedia.org/P68364 and previous config saved to /var/cache/conftool/dbconfig/20240830-201934-ladsgroup.json
  • 20:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T371742)', diff saved to https://phabricator.wikimedia.org/P68363 and previous config saved to /var/cache/conftool/dbconfig/20240830-200606-ladsgroup.json
  • 20:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 20:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 20:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T371742)', diff saved to https://phabricator.wikimedia.org/P68362 and previous config saved to /var/cache/conftool/dbconfig/20240830-200544-ladsgroup.json
  • 20:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P68361 and previous config saved to /var/cache/conftool/dbconfig/20240830-200427-ladsgroup.json
  • 19:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P68359 and previous config saved to /var/cache/conftool/dbconfig/20240830-195037-ladsgroup.json
  • 19:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P68358 and previous config saved to /var/cache/conftool/dbconfig/20240830-194919-ladsgroup.json
  • 19:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P68357 and previous config saved to /var/cache/conftool/dbconfig/20240830-193528-ladsgroup.json
  • 19:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T370903)', diff saved to https://phabricator.wikimedia.org/P68356 and previous config saved to /var/cache/conftool/dbconfig/20240830-193413-ladsgroup.json
  • 19:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T371742)', diff saved to https://phabricator.wikimedia.org/P68355 and previous config saved to /var/cache/conftool/dbconfig/20240830-192021-ladsgroup.json
  • 18:59 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs1014.eqiad.wmnet
  • 18:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T370903)', diff saved to https://phabricator.wikimedia.org/P68354 and previous config saved to /var/cache/conftool/dbconfig/20240830-185427-ladsgroup.json
  • 18:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 18:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 18:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T370903)', diff saved to https://phabricator.wikimedia.org/P68353 and previous config saved to /var/cache/conftool/dbconfig/20240830-185405-ladsgroup.json
  • 18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T371742)', diff saved to https://phabricator.wikimedia.org/P68352 and previous config saved to /var/cache/conftool/dbconfig/20240830-185341-ladsgroup.json
  • 18:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 18:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T371742)', diff saved to https://phabricator.wikimedia.org/P68351 and previous config saved to /var/cache/conftool/dbconfig/20240830-185319-ladsgroup.json
  • 18:51 eevans@cumin1002: START - Cookbook sre.hosts.reboot-single for host aqs1014.eqiad.wmnet
  • 18:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P68350 and previous config saved to /var/cache/conftool/dbconfig/20240830-183858-ladsgroup.json
  • 18:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P68349 and previous config saved to /var/cache/conftool/dbconfig/20240830-183812-ladsgroup.json
  • 18:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P68348 and previous config saved to /var/cache/conftool/dbconfig/20240830-182350-ladsgroup.json
  • 18:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P68347 and previous config saved to /var/cache/conftool/dbconfig/20240830-182304-ladsgroup.json
  • 18:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T370903)', diff saved to https://phabricator.wikimedia.org/P68346 and previous config saved to /var/cache/conftool/dbconfig/20240830-180843-ladsgroup.json
  • 18:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T371742)', diff saved to https://phabricator.wikimedia.org/P68345 and previous config saved to /var/cache/conftool/dbconfig/20240830-180757-ladsgroup.json
  • 17:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T370903)', diff saved to https://phabricator.wikimedia.org/P68344 and previous config saved to /var/cache/conftool/dbconfig/20240830-174822-ladsgroup.json
  • 17:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 17:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 17:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T370903)', diff saved to https://phabricator.wikimedia.org/P68343 and previous config saved to /var/cache/conftool/dbconfig/20240830-174800-ladsgroup.json
  • 17:44 mutante: releases1003/2003 - sudo apt-get remove openjdk-11-* - Java 11 has been replaced by Java 17 - T359795
  • 17:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T371742)', diff saved to https://phabricator.wikimedia.org/P68342 and previous config saved to /var/cache/conftool/dbconfig/20240830-173905-ladsgroup.json
  • 17:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 17:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 17:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T371742)', diff saved to https://phabricator.wikimedia.org/P68341 and previous config saved to /var/cache/conftool/dbconfig/20240830-173843-ladsgroup.json
  • 17:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P68340 and previous config saved to /var/cache/conftool/dbconfig/20240830-173253-ladsgroup.json
  • 17:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P68339 and previous config saved to /var/cache/conftool/dbconfig/20240830-172336-ladsgroup.json
  • 17:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P68338 and previous config saved to /var/cache/conftool/dbconfig/20240830-171745-ladsgroup.json
  • 17:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P68337 and previous config saved to /var/cache/conftool/dbconfig/20240830-170829-ladsgroup.json
  • 17:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T370903)', diff saved to https://phabricator.wikimedia.org/P68336 and previous config saved to /var/cache/conftool/dbconfig/20240830-170238-ladsgroup.json
  • 16:59 swfrench-wmf: running homer 'cr*codfw*' commit 'T372878'
  • 16:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T371742)', diff saved to https://phabricator.wikimedia.org/P68335 and previous config saved to /var/cache/conftool/dbconfig/20240830-165322-ladsgroup.json
  • 16:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T370903)', diff saved to https://phabricator.wikimedia.org/P68334 and previous config saved to /var/cache/conftool/dbconfig/20240830-164425-ladsgroup.json
  • 16:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T370903)', diff saved to https://phabricator.wikimedia.org/P68333 and previous config saved to /var/cache/conftool/dbconfig/20240830-164403-ladsgroup.json
  • 16:42 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2065.codfw.wmnet
  • 16:42 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2065.codfw.wmnet
  • 16:42 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2064.codfw.wmnet
  • 16:42 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2064.codfw.wmnet
  • 16:40 swfrench-wmf: running homer 'lsw1-b3-codfw*' commit 'T372878'
  • 16:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker1033.eqiad.wmnet
  • 16:39 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker1033.eqiad.wmnet
  • 16:39 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2065.codfw.wmnet with OS bullseye
  • 16:32 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2064.codfw.wmnet with OS bullseye
  • 16:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P68332 and previous config saved to /var/cache/conftool/dbconfig/20240830-162856-ladsgroup.json
  • 16:26 claime: homer 'cr*eqiad*' commit 'T351074, T372878, and fix ml-serve and dse-k8s bgp'
  • 16:23 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2059.codfw.wmnet
  • 16:23 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2059.codfw.wmnet
  • 16:23 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2058.codfw.wmnet
  • 16:23 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2058.codfw.wmnet
  • 16:23 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2057.codfw.wmnet
  • 16:23 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2057.codfw.wmnet
  • 16:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T371742)', diff saved to https://phabricator.wikimedia.org/P68331 and previous config saved to /var/cache/conftool/dbconfig/20240830-162258-ladsgroup.json
  • 16:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 16:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 16:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T371742)', diff saved to https://phabricator.wikimedia.org/P68330 and previous config saved to /var/cache/conftool/dbconfig/20240830-162236-ladsgroup.json
  • 16:21 claime: flipping BGP flag to true in netbox for ml-serve-ctrl100[1-2],ml-serve100[1-4],dse-k8s-ctrl100[1-2],dse-k8s-worker100[1-4]
  • 16:19 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2065.codfw.wmnet with reason: host reimage
  • 16:15 swfrench@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2065.codfw.wmnet with reason: host reimage
  • 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P68329 and previous config saved to /var/cache/conftool/dbconfig/20240830-161349-ladsgroup.json
  • 16:12 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2064.codfw.wmnet with reason: host reimage
  • 16:09 swfrench@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2064.codfw.wmnet with reason: host reimage
  • 16:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2062.codfw.wmnet
  • 16:07 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2062.codfw.wmnet
  • 16:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P68328 and previous config saved to /var/cache/conftool/dbconfig/20240830-160729-ladsgroup.json
  • 16:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2061.codfw.wmnet
  • 16:07 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2061.codfw.wmnet
  • 16:02 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2060.codfw.wmnet
  • 16:02 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2060.codfw.wmnet
  • 16:01 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2061.codfw.wmnet with OS bullseye
  • 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T370903)', diff saved to https://phabricator.wikimedia.org/P68326 and previous config saved to /var/cache/conftool/dbconfig/20240830-155842-ladsgroup.json
  • 15:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1033.eqiad.wmnet with OS bullseye
  • 15:57 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2065
  • 15:56 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2065
  • 15:56 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2065
  • 15:56 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2065.codfw.wmnet 235.16.192.10.in-addr.arpa 5.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:56 swfrench@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-worker2065.codfw.wmnet 235.16.192.10.in-addr.arpa 5.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:56 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:56 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2065 - swfrench@cumin2002"
  • 15:56 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2065 - swfrench@cumin2002"
  • 15:55 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2062.codfw.wmnet with OS bullseye
  • 15:53 hnowlan: homer 'lsw1-a3-codfw*' commit
  • 15:52 swfrench@cumin2002: START - Cookbook sre.dns.netbox
  • 15:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P68325 and previous config saved to /var/cache/conftool/dbconfig/20240830-155222-ladsgroup.json
  • 15:52 swfrench@cumin2002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2065
  • 15:52 swfrench@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2065.codfw.wmnet with OS bullseye
  • 15:50 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2064
  • 15:50 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2064
  • 15:50 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2064
  • 15:50 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2064.codfw.wmnet 211.16.192.10.in-addr.arpa 1.1.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:50 swfrench@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-worker2064.codfw.wmnet 211.16.192.10.in-addr.arpa 1.1.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:50 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:49 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2064 - swfrench@cumin2002"
  • 15:49 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2064 - swfrench@cumin2002"
  • 15:49 claime: homer 'cr*eqiad*' commit 'T351074'
  • 15:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2063.codfw.wmnet
  • 15:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2063.codfw.wmnet
  • 15:47 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2060.codfw.wmnet with OS bullseye
  • 15:46 swfrench@cumin2002: START - Cookbook sre.dns.netbox
  • 15:45 swfrench@cumin2002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2064
  • 15:45 swfrench@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2064.codfw.wmnet with OS bullseye
  • 15:44 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2064.codfw.wmnet wikikube-worker2065.codfw.wmnet on all recursors
  • 15:44 swfrench@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-worker2064.codfw.wmnet wikikube-worker2065.codfw.wmnet on all recursors
  • 15:44 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2057 to wikikube-worker2065
  • 15:43 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2065
  • 15:43 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2065
  • 15:43 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:43 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2057 to wikikube-worker2065 - swfrench@cumin2002"
  • 15:42 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2057 to wikikube-worker2065 - swfrench@cumin2002"
  • 15:41 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2061.codfw.wmnet with reason: host reimage
  • 15:41 claime: homer 'lsw1-a3-codfw*' commit 'T351074'
  • 15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T370903)', diff saved to https://phabricator.wikimedia.org/P68323 and previous config saved to /var/cache/conftool/dbconfig/20240830-154054-ladsgroup.json
  • 15:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2063.codfw.wmnet with OS bullseye
  • 15:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T370903)', diff saved to https://phabricator.wikimedia.org/P68322 and previous config saved to /var/cache/conftool/dbconfig/20240830-154004-ladsgroup.json
  • 15:39 swfrench@cumin2002: START - Cookbook sre.dns.netbox
  • 15:39 swfrench@cumin2002: START - Cookbook sre.hosts.rename from kubernetes2057 to wikikube-worker2065
  • 15:38 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2061.codfw.wmnet with reason: host reimage
  • 15:38 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2030 to wikikube-worker2064
  • 15:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1033.eqiad.wmnet with reason: host reimage
  • 15:37 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2064
  • 15:37 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2064
  • 15:37 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:37 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2030 to wikikube-worker2064 - swfrench@cumin2002"
  • 15:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T371742)', diff saved to https://phabricator.wikimedia.org/P68320 and previous config saved to /var/cache/conftool/dbconfig/20240830-153715-ladsgroup.json
  • 15:37 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2030 to wikikube-worker2064 - swfrench@cumin2002"
  • 15:35 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2062.codfw.wmnet with reason: host reimage
  • 15:33 swfrench@cumin2002: START - Cookbook sre.dns.netbox
  • 15:33 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1033.eqiad.wmnet with reason: host reimage
  • 15:33 swfrench@cumin2002: START - Cookbook sre.hosts.rename from kubernetes2030 to wikikube-worker2064
  • 15:31 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2062.codfw.wmnet with reason: host reimage
  • 15:29 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2057.codfw.wmnet
  • 15:29 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2057.codfw.wmnet
  • 15:28 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2030.codfw.wmnet
  • 15:28 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2030.codfw.wmnet
  • 15:27 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2060.codfw.wmnet with reason: host reimage
  • 15:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P68319 and previous config saved to /var/cache/conftool/dbconfig/20240830-152457-ladsgroup.json
  • 15:23 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2060.codfw.wmnet with reason: host reimage
  • 15:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2061
  • 15:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2061
  • 15:20 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2061
  • 15:20 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2061.codfw.wmnet 47.0.192.10.in-addr.arpa 7.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:20 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2061.codfw.wmnet 47.0.192.10.in-addr.arpa 7.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:20 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:20 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2061 - hnowlan@cumin1002"
  • 15:20 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2061 - hnowlan@cumin1002"
  • 15:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2063.codfw.wmnet with reason: host reimage
  • 15:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1033.eqiad.wmnet with OS bullseye
  • 15:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1033.eqiad.wmnet on all recursors
  • 15:19 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1033.eqiad.wmnet on all recursors
  • 15:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1398 to wikikube-worker1033
  • 15:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1033
  • 15:17 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1033
  • 15:17 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 15:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1398 to wikikube-worker1033 - cgoubert@cumin1002"
  • 15:17 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2063.codfw.wmnet with reason: host reimage
  • 15:17 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1398 to wikikube-worker1033 - cgoubert@cumin1002"
  • 15:16 hnowlan@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2061
  • 15:15 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2062
  • 15:15 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2062
  • 15:15 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2062
  • 15:15 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2062.codfw.wmnet 48.0.192.10.in-addr.arpa 8.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:15 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2062.codfw.wmnet 48.0.192.10.in-addr.arpa 8.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:15 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:15 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2062 - hnowlan@cumin1002"
  • 15:13 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:13 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2062 - hnowlan@cumin1002"
  • 15:12 klausman@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:12 klausman@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:11 klausman@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:11 klausman@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T371742)', diff saved to https://phabricator.wikimedia.org/P68318 and previous config saved to /var/cache/conftool/dbconfig/20240830-151128-ladsgroup.json
  • 15:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P68317 and previous config saved to /var/cache/conftool/dbconfig/20240830-150950-ladsgroup.json
  • 15:08 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1398 to wikikube-worker1033
  • 15:08 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:08 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 15:08 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2060.codfw.wmnet with OS bullseye
  • 15:07 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:07 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 15:07 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2060.codfw.wmnet with OS bullseye
  • 15:07 hnowlan@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2062
  • 15:07 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:07 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2060.codfw.wmnet with OS bullseye
  • 15:07 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2061.codfw.wmnet with OS bullseye
  • 15:07 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2062.codfw.wmnet with OS bullseye
  • 15:06 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:06 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:05 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2383 to wikikube-worker2060
  • 15:04 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2060
  • 15:02 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2060
  • 15:02 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2063
  • 15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2063
  • 15:00 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 14:58 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:58 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2384 to wikikube-worker2061
  • 14:58 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2063
  • 14:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2063.codfw.wmnet 169.0.192.10.in-addr.arpa 9.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:58 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2063.codfw.wmnet 169.0.192.10.in-addr.arpa 9.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:58 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:58 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:58 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2061
  • 14:57 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:57 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2061
  • 14:57 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:57 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2384 to wikikube-worker2061 - hnowlan@cumin1002"
  • 14:57 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:57 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2384 to wikikube-worker2061 - hnowlan@cumin1002"
  • 14:56 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:56 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T370903)', diff saved to https://phabricator.wikimedia.org/P68316 and previous config saved to /var/cache/conftool/dbconfig/20240830-145442-ladsgroup.json
  • 14:51 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2385 to wikikube-worker2062
  • 14:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2062
  • 14:50 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 14:50 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2062
  • 14:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2385 to wikikube-worker2062 - hnowlan@cumin1002"
  • 14:50 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2385 to wikikube-worker2062 - hnowlan@cumin1002"
  • 14:49 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2063
  • 14:49 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2063.codfw.wmnet with OS bullseye
  • 14:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2299 to wikikube-worker2063
  • 14:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2063
  • 14:47 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2063
  • 14:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2299 to wikikube-worker2063 - cgoubert@cumin1002"
  • 14:46 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2299 to wikikube-worker2063 - cgoubert@cumin1002"
  • 14:46 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 14:44 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2058.codfw.wmnet with OS bullseye
  • 14:44 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2385 to wikikube-worker2062
  • 14:44 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2384 to wikikube-worker2061
  • 14:41 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:40 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2383 to wikikube-worker2060
  • 14:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2299 to wikikube-worker2063
  • 14:38 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2057.codfw.wmnet with OS bullseye
  • 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T370903)', diff saved to https://phabricator.wikimedia.org/P68315 and previous config saved to /var/cache/conftool/dbconfig/20240830-143537-ladsgroup.json
  • 14:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 14:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T370903)', diff saved to https://phabricator.wikimedia.org/P68314 and previous config saved to /var/cache/conftool/dbconfig/20240830-143516-ladsgroup.json
  • 14:33 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2059.codfw.wmnet with OS bullseye
  • 14:31 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2385.codfw.wmnet
  • 14:31 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2385.codfw.wmnet
  • 14:30 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2384.codfw.wmnet
  • 14:30 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2384.codfw.wmnet
  • 14:28 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2383.codfw.wmnet
  • 14:28 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2383.codfw.wmnet
  • 14:24 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2058.codfw.wmnet with reason: host reimage
  • 14:22 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2058.codfw.wmnet with reason: host reimage
  • 14:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P68313 and previous config saved to /var/cache/conftool/dbconfig/20240830-142008-ladsgroup.json
  • 14:18 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2057.codfw.wmnet with reason: host reimage
  • 14:15 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2057.codfw.wmnet with reason: host reimage
  • 14:14 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2059.codfw.wmnet with reason: host reimage
  • 14:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T371742)', diff saved to https://phabricator.wikimedia.org/P68312 and previous config saved to /var/cache/conftool/dbconfig/20240830-141311-ladsgroup.json
  • 14:11 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2059.codfw.wmnet with reason: host reimage
  • 14:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2058
  • 14:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2058
  • 14:06 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2058
  • 14:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2058.codfw.wmnet 41.0.192.10.in-addr.arpa 1.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:06 akosiaris@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2058.codfw.wmnet 41.0.192.10.in-addr.arpa 1.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2058 - akosiaris@cumin1002"
  • 14:06 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2058 - akosiaris@cumin1002"
  • 14:05 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2056.codfw.wmnet
  • 14:05 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2056.codfw.wmnet
  • 14:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P68311 and previous config saved to /var/cache/conftool/dbconfig/20240830-140501-ladsgroup.json
  • 14:03 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 13:58 akosiaris@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2058
  • 13:58 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2057
  • 13:58 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2057
  • 13:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P68310 and previous config saved to /var/cache/conftool/dbconfig/20240830-135804-ladsgroup.json
  • 13:57 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2057
  • 13:57 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2057.codfw.wmnet 40.0.192.10.in-addr.arpa 0.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:56 akosiaris@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2057.codfw.wmnet 40.0.192.10.in-addr.arpa 0.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:56 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:56 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2057 - akosiaris@cumin1002"
  • 13:56 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2057 - akosiaris@cumin1002"
  • 13:55 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2059.codfw.wmnet with OS bullseye
  • 13:53 akosiaris@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2059.codfw.wmnet with OS bullseye
  • 13:53 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 13:53 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2059.codfw.wmnet with OS bullseye
  • 13:53 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2058.codfw.wmnet with OS bullseye
  • 13:52 akosiaris@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2057
  • 13:52 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2057.codfw.wmnet with OS bullseye
  • 13:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T370903)', diff saved to https://phabricator.wikimedia.org/P68309 and previous config saved to /var/cache/conftool/dbconfig/20240830-134954-ladsgroup.json
  • 13:46 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2379 to wikikube-worker2059
  • 13:45 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2059
  • 13:45 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl2003.codfw.wmnet
  • 13:45 jayme@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl2003.codfw.wmnet
  • 13:45 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl2001.codfw.wmnet
  • 13:45 jayme@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl2001.codfw.wmnet
  • 13:45 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2059
  • 13:45 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:45 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2379 to wikikube-worker2059 - akosiaris@cumin1002"
  • 13:43 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 13:43 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 13:43 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-ctrl2001.codfw.wmnet
  • 13:43 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-ctrl2001.codfw.wmnet
  • 13:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P68308 and previous config saved to /var/cache/conftool/dbconfig/20240830-134257-ladsgroup.json
  • 13:42 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2379 to wikikube-worker2059 - akosiaris@cumin1002"
  • 13:41 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-ctrl2001.codfw.wmnet
  • 13:41 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-ctrl2001.codfw.wmnet
  • 13:40 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-ctrl2003.codfw.wmnet
  • 13:40 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-ctrl2003.codfw.wmnet
  • 13:38 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
  • 13:38 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
  • 13:35 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-ctrl2003.codfw.wmnet
  • 13:35 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-ctrl2003.codfw.wmnet
  • 13:34 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 13:34 akosiaris@cumin1002: START - Cookbook sre.hosts.rename from mw2379 to wikikube-worker2059
  • 13:33 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2378 to wikikube-worker2058
  • 13:33 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2058
  • 13:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T370903)', diff saved to https://phabricator.wikimedia.org/P68307 and previous config saved to /var/cache/conftool/dbconfig/20240830-133201-ladsgroup.json
  • 13:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 13:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 13:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T370903)', diff saved to https://phabricator.wikimedia.org/P68306 and previous config saved to /var/cache/conftool/dbconfig/20240830-133139-ladsgroup.json
  • 13:31 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2058
  • 13:31 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:31 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2378 to wikikube-worker2058 - akosiaris@cumin1002"
  • 13:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T371742)', diff saved to https://phabricator.wikimedia.org/P68305 and previous config saved to /var/cache/conftool/dbconfig/20240830-132750-ladsgroup.json
  • 13:27 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2378 to wikikube-worker2058 - akosiaris@cumin1002"
  • 13:27 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host wikikube-worker2001.codfw.wmnet
  • 13:27 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-worker2001.codfw.wmnet
  • 13:26 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host wikikube-ctrl2003.codfw.wmnet
  • 13:26 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-ctrl2003.codfw.wmnet
  • 13:21 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 13:21 akosiaris@cumin1002: START - Cookbook sre.hosts.rename from mw2378 to wikikube-worker2058
  • 13:21 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host wikikube-ctrl2003.codfw.wmnet
  • 13:21 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-ctrl2003.codfw.wmnet
  • 13:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P68304 and previous config saved to /var/cache/conftool/dbconfig/20240830-131631-ladsgroup.json
  • 13:04 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2377 to wikikube-worker2057
  • 13:04 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2057
  • 13:04 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2057
  • 13:04 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:04 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2377 to wikikube-worker2057 - akosiaris@cumin1002"
  • 13:02 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2377 to wikikube-worker2057 - akosiaris@cumin1002"
  • 13:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P68303 and previous config saved to /var/cache/conftool/dbconfig/20240830-130124-ladsgroup.json
  • 12:59 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 12:59 akosiaris@cumin1002: START - Cookbook sre.hosts.rename from mw2377 to wikikube-worker2057
  • 12:56 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2056.codfw.wmnet with OS bullseye
  • 12:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T370903)', diff saved to https://phabricator.wikimedia.org/P68302 and previous config saved to /var/cache/conftool/dbconfig/20240830-124617-ladsgroup.json
  • 12:37 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2056.codfw.wmnet with reason: host reimage
  • 12:33 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2056.codfw.wmnet with reason: host reimage
  • 12:27 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2055.codfw.wmnet
  • 12:27 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2055.codfw.wmnet
  • 12:25 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2055.codfw.wmnet with OS bullseye
  • 12:24 hnowlan: homer 'lsw1-a3-codfw*' commit
  • 12:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T371742)', diff saved to https://phabricator.wikimedia.org/P68301 and previous config saved to /var/cache/conftool/dbconfig/20240830-122139-ladsgroup.json
  • 12:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 12:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 12:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T371742)', diff saved to https://phabricator.wikimedia.org/P68300 and previous config saved to /var/cache/conftool/dbconfig/20240830-122106-ladsgroup.json
  • 12:20 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2379.codfw.wmnet
  • 12:19 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2379.codfw.wmnet
  • 12:19 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2378.codfw.wmnet
  • 12:17 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2056
  • 12:17 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2056
  • 12:17 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2056
  • 12:17 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2056.codfw.wmnet 45.0.192.10.in-addr.arpa 5.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:17 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2056.codfw.wmnet 45.0.192.10.in-addr.arpa 5.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:17 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:17 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2056 - hnowlan@cumin1002"
  • 12:17 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2056 - hnowlan@cumin1002"
  • 12:16 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2378.codfw.wmnet
  • 12:16 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2377.codfw.wmnet
  • 12:15 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2377.codfw.wmnet
  • 12:13 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 12:13 hnowlan@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2056
  • 12:13 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2056.codfw.wmnet with OS bullseye
  • 12:12 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2056.codfw.wmnet on all recursors
  • 12:12 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2056.codfw.wmnet on all recursors
  • 12:11 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2055.codfw.wmnet on all recursors
  • 12:11 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2055.codfw.wmnet on all recursors
  • 12:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2382 to wikikube-worker2056
  • 12:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2056
  • 12:08 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2056
  • 12:08 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:08 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2382 to wikikube-worker2056 - hnowlan@cumin1002"
  • 12:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T370903)', diff saved to https://phabricator.wikimedia.org/P68299 and previous config saved to /var/cache/conftool/dbconfig/20240830-120742-ladsgroup.json
  • 12:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 12:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 12:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T370903)', diff saved to https://phabricator.wikimedia.org/P68298 and previous config saved to /var/cache/conftool/dbconfig/20240830-120720-ladsgroup.json
  • 12:06 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2055.codfw.wmnet with reason: host reimage
  • 12:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P68297 and previous config saved to /var/cache/conftool/dbconfig/20240830-120559-ladsgroup.json
  • 12:04 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2382 to wikikube-worker2056 - hnowlan@cumin1002"
  • 12:02 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2055.codfw.wmnet with reason: host reimage
  • 12:01 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 12:00 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2382 to wikikube-worker2056
  • 11:57 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2382.codfw.wmnet
  • 11:56 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2382.codfw.wmnet
  • 11:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P68296 and previous config saved to /var/cache/conftool/dbconfig/20240830-115213-ladsgroup.json
  • 11:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P68295 and previous config saved to /var/cache/conftool/dbconfig/20240830-115052-ladsgroup.json
  • 11:46 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2055
  • 11:46 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2055
  • 11:46 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2055
  • 11:46 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2055.codfw.wmnet 44.0.192.10.in-addr.arpa 4.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 11:46 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2055.codfw.wmnet 44.0.192.10.in-addr.arpa 4.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 11:46 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:46 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2055 - hnowlan@cumin1002"
  • 11:46 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2055 - hnowlan@cumin1002"
  • 11:42 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 11:42 hnowlan@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2055
  • 11:41 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2055.codfw.wmnet with OS bullseye
  • 11:40 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2381 to wikikube-worker2055
  • 11:39 hnowlan@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2055
  • 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P68294 and previous config saved to /var/cache/conftool/dbconfig/20240830-113706-ladsgroup.json
  • 11:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T371742)', diff saved to https://phabricator.wikimedia.org/P68293 and previous config saved to /var/cache/conftool/dbconfig/20240830-113544-ladsgroup.json
  • 11:35 hnowlan@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2055
  • 11:35 hnowlan@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:35 hnowlan@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2381 to wikikube-worker2055 - hnowlan@cumin2002"
  • 11:34 hnowlan@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2381 to wikikube-worker2055 - hnowlan@cumin2002"
  • 11:29 hnowlan@cumin2002: START - Cookbook sre.dns.netbox
  • 11:28 hnowlan@cumin2002: START - Cookbook sre.hosts.rename from mw2381 to wikikube-worker2055
  • 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T370903)', diff saved to https://phabricator.wikimedia.org/P68292 and previous config saved to /var/cache/conftool/dbconfig/20240830-112159-ladsgroup.json
  • 11:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T371742)', diff saved to https://phabricator.wikimedia.org/P68291 and previous config saved to /var/cache/conftool/dbconfig/20240830-110426-ladsgroup.json
  • 11:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 11:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 11:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T370903)', diff saved to https://phabricator.wikimedia.org/P68290 and previous config saved to /var/cache/conftool/dbconfig/20240830-110334-ladsgroup.json
  • 11:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 11:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 10:44 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2381.codfw.wmnet
  • 10:44 Emperor: restart swift-proxy on ms-fe2009 and ms-fe2014 T360913
  • 10:44 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2381.codfw.wmnet
  • 10:27 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2054.codfw.wmnet
  • 10:27 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2054.codfw.wmnet
  • 10:27 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2053.codfw.wmnet
  • 10:27 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2053.codfw.wmnet
  • 10:27 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2052.codfw.wmnet
  • 10:27 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2052.codfw.wmnet
  • 10:06 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: sync
  • 10:04 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: sync
  • 09:59 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: sync
  • 09:58 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/proton: sync
  • 09:56 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: sync
  • 09:55 elukey@deploy1003: helmfile [staging] START helmfile.d/services/proton: sync
  • 09:51 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2054.codfw.wmnet with OS bullseye
  • 09:45 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2053.codfw.wmnet with OS bullseye
  • 09:43 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 09:43 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:42 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2052.codfw.wmnet with OS bullseye
  • 09:42 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:39 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 09:31 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2054.codfw.wmnet with reason: host reimage
  • 09:27 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 09:27 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2054.codfw.wmnet with reason: host reimage
  • 09:25 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2053.codfw.wmnet with reason: host reimage
  • 09:22 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2053.codfw.wmnet with reason: host reimage
  • 09:22 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2052.codfw.wmnet with reason: host reimage
  • 09:21 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 09:19 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2052.codfw.wmnet with reason: host reimage
  • 09:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2054
  • 09:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2054
  • 09:10 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2054
  • 09:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2054.codfw.wmnet 167.0.192.10.in-addr.arpa 7.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:10 akosiaris@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2054.codfw.wmnet 167.0.192.10.in-addr.arpa 7.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2054 - akosiaris@cumin1002"
  • 09:10 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2054 - akosiaris@cumin1002"
  • 09:07 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 09:07 akosiaris@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2054
  • 09:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2053
  • 09:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2053
  • 09:06 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2053
  • 09:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2053.codfw.wmnet 166.0.192.10.in-addr.arpa 6.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:06 akosiaris@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2053.codfw.wmnet 166.0.192.10.in-addr.arpa 6.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2053 - akosiaris@cumin1002"
  • 09:06 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2053 - akosiaris@cumin1002"
  • 09:04 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2054.codfw.wmnet with OS bullseye
  • 09:03 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 09:03 akosiaris@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2053
  • 09:03 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2052
  • 09:03 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2052
  • 09:03 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2052
  • 09:02 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2052.codfw.wmnet 165.0.192.10.in-addr.arpa 5.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:02 akosiaris@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2052.codfw.wmnet 165.0.192.10.in-addr.arpa 5.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:02 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:02 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2052 - akosiaris@cumin1002"
  • 09:02 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2052 - akosiaris@cumin1002"
  • 09:02 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2053.codfw.wmnet with OS bullseye
  • 08:59 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 08:59 akosiaris@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2052
  • 08:59 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2052.codfw.wmnet with OS bullseye
  • 08:56 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2297 to wikikube-worker2054
  • 08:55 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2054
  • 08:55 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2054
  • 08:55 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:55 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2297 to wikikube-worker2054 - akosiaris@cumin1002"
  • 08:52 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2297 to wikikube-worker2054 - akosiaris@cumin1002"
  • 08:50 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@3d18901] (releasing): (no justification provided) (duration: 00m 41s)
  • 08:50 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 08:50 elukey@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 08:50 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@3d18901] (releasing): (no justification provided)
  • 08:48 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@3d18901] (releasing): (no justification provided) (duration: 00m 20s)
  • 08:47 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@3d18901] (releasing): (no justification provided)
  • 08:37 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 08:37 akosiaris@cumin1002: START - Cookbook sre.hosts.rename from mw2297 to wikikube-worker2054
  • 08:36 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2296 to wikikube-worker2053
  • 08:36 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2053
  • 08:36 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2053
  • 08:36 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:36 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2296 to wikikube-worker2053 - akosiaris@cumin1002"
  • 08:35 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2296 to wikikube-worker2053 - akosiaris@cumin1002"
  • 08:26 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 08:26 akosiaris@cumin1002: START - Cookbook sre.hosts.rename from mw2296 to wikikube-worker2053
  • 08:24 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2295 to wikikube-worker2052
  • 08:24 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2052
  • 08:23 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2052
  • 08:23 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:23 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2295 to wikikube-worker2052 - akosiaris@cumin1002"
  • 08:23 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2295 to wikikube-worker2052 - akosiaris@cumin1002"
  • 07:36 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 07:36 akosiaris@cumin1002: START - Cookbook sre.hosts.rename from mw2295 to wikikube-worker2052
  • 07:22 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 52965
  • 07:22 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 52965
  • 07:11 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:11 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:11 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:11 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:11 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 07:10 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 07:10 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 07:10 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 07:10 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:10 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 07:10 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 07:09 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 07:09 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:09 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 07:09 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 07:09 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 07:09 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 07:08 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 05:35 kcvelaga@deploy1003: Finished deploy [airflow-dags/analytics_product@0321fda]: (no justification provided) (duration: 00m 32s)
  • 05:34 kcvelaga@deploy1003: Started deploy [airflow-dags/analytics_product@0321fda]: (no justification provided)
  • 04:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T370903)', diff saved to https://phabricator.wikimedia.org/P68289 and previous config saved to /var/cache/conftool/dbconfig/20240830-045519-ladsgroup.json
  • 04:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P68288 and previous config saved to /var/cache/conftool/dbconfig/20240830-044012-ladsgroup.json
  • 04:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P68287 and previous config saved to /var/cache/conftool/dbconfig/20240830-042505-ladsgroup.json
  • 04:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T370903)', diff saved to https://phabricator.wikimedia.org/P68286 and previous config saved to /var/cache/conftool/dbconfig/20240830-040957-ladsgroup.json
  • 04:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T370903)', diff saved to https://phabricator.wikimedia.org/P68285 and previous config saved to /var/cache/conftool/dbconfig/20240830-040055-ladsgroup.json
  • 04:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 04:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 03:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 03:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 03:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T370903)', diff saved to https://phabricator.wikimedia.org/P68284 and previous config saved to /var/cache/conftool/dbconfig/20240830-035123-ladsgroup.json
  • 03:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P68283 and previous config saved to /var/cache/conftool/dbconfig/20240830-033616-ladsgroup.json
  • 03:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P68282 and previous config saved to /var/cache/conftool/dbconfig/20240830-032109-ladsgroup.json
  • 03:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T370903)', diff saved to https://phabricator.wikimedia.org/P68281 and previous config saved to /var/cache/conftool/dbconfig/20240830-030602-ladsgroup.json
  • 02:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T370903)', diff saved to https://phabricator.wikimedia.org/P68280 and previous config saved to /var/cache/conftool/dbconfig/20240830-025809-ladsgroup.json
  • 02:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 02:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 02:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T370903)', diff saved to https://phabricator.wikimedia.org/P68279 and previous config saved to /var/cache/conftool/dbconfig/20240830-025747-ladsgroup.json
  • 02:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P68278 and previous config saved to /var/cache/conftool/dbconfig/20240830-024239-ladsgroup.json
  • 02:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P68277 and previous config saved to /var/cache/conftool/dbconfig/20240830-022732-ladsgroup.json
  • 02:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T370903)', diff saved to https://phabricator.wikimedia.org/P68276 and previous config saved to /var/cache/conftool/dbconfig/20240830-021225-ladsgroup.json
  • 02:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T371742)', diff saved to https://phabricator.wikimedia.org/P68275 and previous config saved to /var/cache/conftool/dbconfig/20240830-020606-ladsgroup.json
  • 02:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T370903)', diff saved to https://phabricator.wikimedia.org/P68274 and previous config saved to /var/cache/conftool/dbconfig/20240830-020305-ladsgroup.json
  • 02:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 02:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 02:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T370903)', diff saved to https://phabricator.wikimedia.org/P68273 and previous config saved to /var/cache/conftool/dbconfig/20240830-020243-ladsgroup.json
  • 01:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P68272 and previous config saved to /var/cache/conftool/dbconfig/20240830-015059-ladsgroup.json
  • 01:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P68271 and previous config saved to /var/cache/conftool/dbconfig/20240830-014736-ladsgroup.json
  • 01:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P68270 and previous config saved to /var/cache/conftool/dbconfig/20240830-013551-ladsgroup.json
  • 01:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P68269 and previous config saved to /var/cache/conftool/dbconfig/20240830-013229-ladsgroup.json
  • 01:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T371742)', diff saved to https://phabricator.wikimedia.org/P68268 and previous config saved to /var/cache/conftool/dbconfig/20240830-012044-ladsgroup.json
  • 01:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T370903)', diff saved to https://phabricator.wikimedia.org/P68267 and previous config saved to /var/cache/conftool/dbconfig/20240830-011721-ladsgroup.json
  • 01:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T370903)', diff saved to https://phabricator.wikimedia.org/P68266 and previous config saved to /var/cache/conftool/dbconfig/20240830-010823-ladsgroup.json
  • 01:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 01:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 01:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T370903)', diff saved to https://phabricator.wikimedia.org/P68265 and previous config saved to /var/cache/conftool/dbconfig/20240830-010801-ladsgroup.json
  • 00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2213 (T371742)', diff saved to https://phabricator.wikimedia.org/P68264 and previous config saved to /var/cache/conftool/dbconfig/20240830-005534-ladsgroup.json
  • 00:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 00:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T371742)', diff saved to https://phabricator.wikimedia.org/P68263 and previous config saved to /var/cache/conftool/dbconfig/20240830-005512-ladsgroup.json
  • 00:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P68262 and previous config saved to /var/cache/conftool/dbconfig/20240830-005254-ladsgroup.json
  • 00:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P68261 and previous config saved to /var/cache/conftool/dbconfig/20240830-004004-ladsgroup.json
  • 00:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P68260 and previous config saved to /var/cache/conftool/dbconfig/20240830-003746-ladsgroup.json
  • 00:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P68259 and previous config saved to /var/cache/conftool/dbconfig/20240830-002457-ladsgroup.json
  • 00:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T370903)', diff saved to https://phabricator.wikimedia.org/P68258 and previous config saved to /var/cache/conftool/dbconfig/20240830-002239-ladsgroup.json
  • 00:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T370903)', diff saved to https://phabricator.wikimedia.org/P68255 and previous config saved to /var/cache/conftool/dbconfig/20240830-001353-ladsgroup.json
  • 00:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 00:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 00:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T370903)', diff saved to https://phabricator.wikimedia.org/P68254 and previous config saved to /var/cache/conftool/dbconfig/20240830-001331-ladsgroup.json
  • 00:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T371742)', diff saved to https://phabricator.wikimedia.org/P68253 and previous config saved to /var/cache/conftool/dbconfig/20240830-000950-ladsgroup.json

2024-08-29

  • 23:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P68252 and previous config saved to /var/cache/conftool/dbconfig/20240829-235824-ladsgroup.json
  • 23:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T371742)', diff saved to https://phabricator.wikimedia.org/P68251 and previous config saved to /var/cache/conftool/dbconfig/20240829-234420-ladsgroup.json
  • 23:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 23:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 23:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P68250 and previous config saved to /var/cache/conftool/dbconfig/20240829-234317-ladsgroup.json
  • 23:33 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1001.eqiad.wmnet with OS bookworm
  • 23:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T370903)', diff saved to https://phabricator.wikimedia.org/P68249 and previous config saved to /var/cache/conftool/dbconfig/20240829-232810-ladsgroup.json
  • 23:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T370903)', diff saved to https://phabricator.wikimedia.org/P68248 and previous config saved to /var/cache/conftool/dbconfig/20240829-232548-ladsgroup.json
  • 23:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 23:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 23:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 23:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 23:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T370903)', diff saved to https://phabricator.wikimedia.org/P68247 and previous config saved to /var/cache/conftool/dbconfig/20240829-232510-ladsgroup.json
  • 23:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 23:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 23:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T371742)', diff saved to https://phabricator.wikimedia.org/P68246 and previous config saved to /var/cache/conftool/dbconfig/20240829-232124-ladsgroup.json
  • 23:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P68245 and previous config saved to /var/cache/conftool/dbconfig/20240829-231003-ladsgroup.json
  • 23:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P68244 and previous config saved to /var/cache/conftool/dbconfig/20240829-230616-ladsgroup.json
  • 22:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P68243 and previous config saved to /var/cache/conftool/dbconfig/20240829-225456-ladsgroup.json
  • 22:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P68242 and previous config saved to /var/cache/conftool/dbconfig/20240829-225109-ladsgroup.json
  • 22:45 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1001.eqiad.wmnet with OS bookworm
  • 22:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1001.eqiad.wmnet with OS bookworm
  • 22:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T370903)', diff saved to https://phabricator.wikimedia.org/P68241 and previous config saved to /var/cache/conftool/dbconfig/20240829-223949-ladsgroup.json
  • 22:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T371742)', diff saved to https://phabricator.wikimedia.org/P68240 and previous config saved to /var/cache/conftool/dbconfig/20240829-223602-ladsgroup.json
  • 22:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T370903)', diff saved to https://phabricator.wikimedia.org/P68239 and previous config saved to /var/cache/conftool/dbconfig/20240829-222824-ladsgroup.json
  • 22:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 22:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 22:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T370903)', diff saved to https://phabricator.wikimedia.org/P68238 and previous config saved to /var/cache/conftool/dbconfig/20240829-222048-ladsgroup.json
  • 22:19 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bookworm
  • 22:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T371742)', diff saved to https://phabricator.wikimedia.org/P68237 and previous config saved to /var/cache/conftool/dbconfig/20240829-221559-ladsgroup.json
  • 22:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 22:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 22:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T371742)', diff saved to https://phabricator.wikimedia.org/P68236 and previous config saved to /var/cache/conftool/dbconfig/20240829-221537-ladsgroup.json
  • 22:10 zabe: zabe@mwmaint1002:~$ mwscript extensions/WikimediaMaintenance/migrateESRefToContentTable.php testwiki # T183490
  • 22:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P68235 and previous config saved to /var/cache/conftool/dbconfig/20240829-220541-ladsgroup.json
  • 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P68234 and previous config saved to /var/cache/conftool/dbconfig/20240829-220030-ladsgroup.json
  • 21:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1001.eqiad.wmnet with OS bookworm
  • 21:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1001.eqiad.wmnet with OS bookworm
  • 21:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P68233 and previous config saved to /var/cache/conftool/dbconfig/20240829-215034-ladsgroup.json
  • 21:50 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
  • 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P68232 and previous config saved to /var/cache/conftool/dbconfig/20240829-214523-ladsgroup.json
  • 21:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
  • 21:45 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 21:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T370903)', diff saved to https://phabricator.wikimedia.org/P68231 and previous config saved to /var/cache/conftool/dbconfig/20240829-213526-ladsgroup.json
  • 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T371742)', diff saved to https://phabricator.wikimedia.org/P68230 and previous config saved to /var/cache/conftool/dbconfig/20240829-213015-ladsgroup.json
  • 21:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T370903)', diff saved to https://phabricator.wikimedia.org/P68229 and previous config saved to /var/cache/conftool/dbconfig/20240829-212727-ladsgroup.json
  • 21:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 21:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 21:24 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 21:19 cmooney@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2002.codfw.wmnet with OS bookworm
  • 21:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 21:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 21:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T370903)', diff saved to https://phabricator.wikimedia.org/P68228 and previous config saved to /var/cache/conftool/dbconfig/20240829-211642-ladsgroup.json
  • 21:10 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1009.eqiad.wmnet with reason: host reimage
  • 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T371742)', diff saved to https://phabricator.wikimedia.org/P68227 and previous config saved to /var/cache/conftool/dbconfig/20240829-210822-ladsgroup.json
  • 21:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 21:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T371742)', diff saved to https://phabricator.wikimedia.org/P68226 and previous config saved to /var/cache/conftool/dbconfig/20240829-210759-ladsgroup.json
  • 21:07 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1009.eqiad.wmnet with reason: host reimage
  • 21:04 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
  • 21:03 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1001.eqiad.wmnet with OS bookworm
  • 21:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P68225 and previous config saved to /var/cache/conftool/dbconfig/20240829-210135-ladsgroup.json
  • 20:56 urbanecm@deploy1003: Finished scap sync-world: Backport for Turn on Parsoid Read Views for eo/sv/fi wikivoyage (T372810), Add project talk aliases for mnwiki (T366271) (duration: 13m 16s)
  • 20:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
  • 20:54 eileen: civicrm upgraded from 916cad45 to 27b1f673
  • 20:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P68224 and previous config saved to /var/cache/conftool/dbconfig/20240829-205252-ladsgroup.json
  • 20:51 urbanecm@deploy1003: urbanecm, srishakatux, cscott: Continuing with sync
  • 20:49 cmooney@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2002.codfw.wmnet with OS bookworm
  • 20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P68223 and previous config saved to /var/cache/conftool/dbconfig/20240829-204628-ladsgroup.json
  • 20:44 urbanecm@deploy1003: urbanecm, srishakatux, cscott: Backport for Turn on Parsoid Read Views for eo/sv/fi wikivoyage (T372810), Add project talk aliases for mnwiki (T366271) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:42 urbanecm@deploy1003: Started scap sync-world: Backport for Turn on Parsoid Read Views for eo/sv/fi wikivoyage (T372810), Add project talk aliases for mnwiki (T366271)
  • 20:42 urbanecm@deploy1003: Finished scap sync-world: Backport for kuswiki: add custom logos (T368868), bewwiki: add custom logos (T368868), Enable AutoModerator on id.wiki (T365792) (duration: 07m 50s)
  • 20:38 urbanecm@deploy1003: kgraessle, urbanecm, chlod: Continuing with sync
  • 20:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P68222 and previous config saved to /var/cache/conftool/dbconfig/20240829-203745-ladsgroup.json
  • 20:36 urbanecm@deploy1003: kgraessle, urbanecm, chlod: Backport for kuswiki: add custom logos (T368868), bewwiki: add custom logos (T368868), Enable AutoModerator on id.wiki (T365792) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:34 urbanecm@deploy1003: Started scap sync-world: Backport for kuswiki: add custom logos (T368868), bewwiki: add custom logos (T368868), Enable AutoModerator on id.wiki (T365792)
  • 20:33 urbanecm@deploy1003: Finished scap sync-world: Backport for kawikisource: re-add custom logos (T368868), kaawiktionary: re-add custom logos (T368868), iglwiki: add custom logos (T368868), mywikisource: add custom logos (T368868) (duration: 10m 48s)
  • 20:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T370903)', diff saved to https://phabricator.wikimedia.org/P68221 and previous config saved to /var/cache/conftool/dbconfig/20240829-203120-ladsgroup.json
  • 20:28 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 20:28 urbanecm@deploy1003: urbanecm, chlod: Continuing with sync
  • 20:28 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 20:26 urbanecm@deploy1003: urbanecm, chlod: Backport for kawikisource: re-add custom logos (T368868), kaawiktionary: re-add custom logos (T368868), iglwiki: add custom logos (T368868), mywikisource: add custom logos (T368868) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:23 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
  • 20:22 urbanecm@deploy1003: Started scap sync-world: Backport for kawikisource: re-add custom logos (T368868), kaawiktionary: re-add custom logos (T368868), iglwiki: add custom logos (T368868), mywikisource: add custom logos (T368868)
  • 20:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T371742)', diff saved to https://phabricator.wikimedia.org/P68220 and previous config saved to /var/cache/conftool/dbconfig/20240829-202238-ladsgroup.json
  • 20:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T370903)', diff saved to https://phabricator.wikimedia.org/P68219 and previous config saved to /var/cache/conftool/dbconfig/20240829-202231-ladsgroup.json
  • 20:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 20:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 20:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T370903)', diff saved to https://phabricator.wikimedia.org/P68218 and previous config saved to /var/cache/conftool/dbconfig/20240829-202209-ladsgroup.json
  • 20:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2002 - cmooney@cumin1002"
  • 20:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2002 - cmooney@cumin1002"
  • 20:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 20:17 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest2002.codfw.wmnet
  • 20:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P68217 and previous config saved to /var/cache/conftool/dbconfig/20240829-200701-ladsgroup.json
  • 19:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T371742)', diff saved to https://phabricator.wikimedia.org/P68216 and previous config saved to /var/cache/conftool/dbconfig/20240829-195609-ladsgroup.json
  • 19:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 19:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 19:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T371742)', diff saved to https://phabricator.wikimedia.org/P68215 and previous config saved to /var/cache/conftool/dbconfig/20240829-195547-ladsgroup.json
  • 19:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P68214 and previous config saved to /var/cache/conftool/dbconfig/20240829-195154-ladsgroup.json
  • 19:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P68213 and previous config saved to /var/cache/conftool/dbconfig/20240829-194040-ladsgroup.json
  • 19:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T370903)', diff saved to https://phabricator.wikimedia.org/P68212 and previous config saved to /var/cache/conftool/dbconfig/20240829-193647-ladsgroup.json
  • 19:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T370903)', diff saved to https://phabricator.wikimedia.org/P68211 and previous config saved to /var/cache/conftool/dbconfig/20240829-193436-ladsgroup.json
  • 19:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 19:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P68210 and previous config saved to /var/cache/conftool/dbconfig/20240829-192533-ladsgroup.json
  • 19:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 19:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 19:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T370903)', diff saved to https://phabricator.wikimedia.org/P68209 and previous config saved to /var/cache/conftool/dbconfig/20240829-192409-ladsgroup.json
  • 19:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T371742)', diff saved to https://phabricator.wikimedia.org/P68208 and previous config saved to /var/cache/conftool/dbconfig/20240829-191026-ladsgroup.json
  • 19:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P68207 and previous config saved to /var/cache/conftool/dbconfig/20240829-190902-ladsgroup.json
  • 19:06 cmooney@cumin1002: START - Cookbook sre.hosts.dhcp for host sretest2002.codfw.wmnet
  • 18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P68206 and previous config saved to /var/cache/conftool/dbconfig/20240829-185355-ladsgroup.json
  • 18:52 kamila@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2051.codfw.wmnet
  • 18:52 kamila@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2051.codfw.wmnet
  • 18:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T371742)', diff saved to https://phabricator.wikimedia.org/P68205 and previous config saved to /var/cache/conftool/dbconfig/20240829-184242-ladsgroup.json
  • 18:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T371742)', diff saved to https://phabricator.wikimedia.org/P68204 and previous config saved to /var/cache/conftool/dbconfig/20240829-184220-ladsgroup.json
  • 18:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T370903)', diff saved to https://phabricator.wikimedia.org/P68203 and previous config saved to /var/cache/conftool/dbconfig/20240829-183848-ladsgroup.json
  • 18:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T370903)', diff saved to https://phabricator.wikimedia.org/P68202 and previous config saved to /var/cache/conftool/dbconfig/20240829-183638-ladsgroup.json
  • 18:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 18:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 18:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T370903)', diff saved to https://phabricator.wikimedia.org/P68201 and previous config saved to /var/cache/conftool/dbconfig/20240829-183616-ladsgroup.json
  • 18:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P68200 and previous config saved to /var/cache/conftool/dbconfig/20240829-182713-ladsgroup.json
  • 18:24 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@abb06c4]: Deploy latest Analitycs Airflow DAGs to pickup T373402 (duration: 00m 42s)
  • 18:23 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@abb06c4]: Deploy latest Analitycs Airflow DAGs to pickup T373402
  • 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P68199 and previous config saved to /var/cache/conftool/dbconfig/20240829-182108-ladsgroup.json
  • 18:15 kamila_: running homer after wikikube-worker2051 rename
  • 18:12 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2051.codfw.wmnet with OS bullseye
  • 18:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P68198 and previous config saved to /var/cache/conftool/dbconfig/20240829-181205-ladsgroup.json
  • 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P68197 and previous config saved to /var/cache/conftool/dbconfig/20240829-180601-ladsgroup.json
  • 17:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T371742)', diff saved to https://phabricator.wikimedia.org/P68196 and previous config saved to /var/cache/conftool/dbconfig/20240829-175658-ladsgroup.json
  • 17:51 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2051.codfw.wmnet with reason: host reimage
  • 17:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T370903)', diff saved to https://phabricator.wikimedia.org/P68195 and previous config saved to /var/cache/conftool/dbconfig/20240829-175053-ladsgroup.json
  • 17:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T370903)', diff saved to https://phabricator.wikimedia.org/P68194 and previous config saved to /var/cache/conftool/dbconfig/20240829-174842-ladsgroup.json
  • 17:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 17:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 17:48 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2051.codfw.wmnet with reason: host reimage
  • 17:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T370903)', diff saved to https://phabricator.wikimedia.org/P68193 and previous config saved to /var/cache/conftool/dbconfig/20240829-174820-ladsgroup.json
  • 17:39 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 17:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T371742)', diff saved to https://phabricator.wikimedia.org/P68192 and previous config saved to /var/cache/conftool/dbconfig/20240829-173416-ladsgroup.json
  • 17:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 17:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 17:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P68191 and previous config saved to /var/cache/conftool/dbconfig/20240829-173313-ladsgroup.json
  • 17:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T367856)', diff saved to https://phabricator.wikimedia.org/P68190 and previous config saved to /var/cache/conftool/dbconfig/20240829-173303-marostegui.json
  • 17:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 7:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 17:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 7:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 17:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T367856)', diff saved to https://phabricator.wikimedia.org/P68189 and previous config saved to /var/cache/conftool/dbconfig/20240829-173240-marostegui.json
  • 17:32 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2051
  • 17:32 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2051
  • 17:31 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2051
  • 17:31 kamila@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2051.codfw.wmnet 65.0.192.10.in-addr.arpa 5.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 17:31 kamila@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2051.codfw.wmnet 65.0.192.10.in-addr.arpa 5.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 17:31 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:31 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2051 - kamila@cumin1002"
  • 17:31 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2051 - kamila@cumin1002"
  • 17:28 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 17:27 kamila@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2051
  • 17:27 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2051.codfw.wmnet with OS bullseye
  • 17:27 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2401 to wikikube-worker2051
  • 17:26 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2051
  • 17:26 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2051
  • 17:26 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:26 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2401 to wikikube-worker2051 - kamila@cumin1002"
  • 17:25 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2401 to wikikube-worker2051 - kamila@cumin1002"
  • 17:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 17:21 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 17:21 kamila@cumin1002: START - Cookbook sre.hosts.rename from mw2401 to wikikube-worker2051
  • 17:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P68188 and previous config saved to /var/cache/conftool/dbconfig/20240829-171759-ladsgroup.json
  • 17:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P68187 and previous config saved to /var/cache/conftool/dbconfig/20240829-171733-marostegui.json
  • 17:17 kamila@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) depool for host mw2401.codfw.wmnet
  • 17:16 kamila@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2401.codfw.wmnet
  • 17:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 17:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 17:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T370903)', diff saved to https://phabricator.wikimedia.org/P68186 and previous config saved to /var/cache/conftool/dbconfig/20240829-170252-ladsgroup.json
  • 17:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P68185 and previous config saved to /var/cache/conftool/dbconfig/20240829-170224-marostegui.json
  • 16:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T370903)', diff saved to https://phabricator.wikimedia.org/P68184 and previous config saved to /var/cache/conftool/dbconfig/20240829-165341-ladsgroup.json
  • 16:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T370903)', diff saved to https://phabricator.wikimedia.org/P68183 and previous config saved to /var/cache/conftool/dbconfig/20240829-165319-ladsgroup.json
  • 16:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 16:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 16:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T367856)', diff saved to https://phabricator.wikimedia.org/P68182 and previous config saved to /var/cache/conftool/dbconfig/20240829-164717-marostegui.json
  • 16:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P68180 and previous config saved to /var/cache/conftool/dbconfig/20240829-163811-ladsgroup.json
  • 16:27 topranks: update qos configuration for asw2-ulsfo to use traffic-control profile T373594
  • 16:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 16:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 16:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T371742)', diff saved to https://phabricator.wikimedia.org/P68179 and previous config saved to /var/cache/conftool/dbconfig/20240829-162601-ladsgroup.json
  • 16:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P68178 and previous config saved to /var/cache/conftool/dbconfig/20240829-162304-ladsgroup.json
  • 16:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P68177 and previous config saved to /var/cache/conftool/dbconfig/20240829-161054-ladsgroup.json
  • 16:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T370903)', diff saved to https://phabricator.wikimedia.org/P68176 and previous config saved to /var/cache/conftool/dbconfig/20240829-160757-ladsgroup.json
  • 16:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T370903)', diff saved to https://phabricator.wikimedia.org/P68175 and previous config saved to /var/cache/conftool/dbconfig/20240829-160447-ladsgroup.json
  • 16:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 16:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 16:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T370903)', diff saved to https://phabricator.wikimedia.org/P68174 and previous config saved to /var/cache/conftool/dbconfig/20240829-160425-ladsgroup.json
  • 16:04 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1001.eqiad.wmnet with OS bookworm
  • 15:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P68173 and previous config saved to /var/cache/conftool/dbconfig/20240829-155547-ladsgroup.json
  • 15:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P68172 and previous config saved to /var/cache/conftool/dbconfig/20240829-155431-ladsgroup.json
  • 15:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P68171 and previous config saved to /var/cache/conftool/dbconfig/20240829-154917-ladsgroup.json
  • 15:42 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4010.ulsfo.wmnet
  • 15:42 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs4010.ulsfo.wmnet
  • 15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T371742)', diff saved to https://phabricator.wikimedia.org/P68170 and previous config saved to /var/cache/conftool/dbconfig/20240829-154040-ladsgroup.json
  • 15:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P68169 and previous config saved to /var/cache/conftool/dbconfig/20240829-153925-ladsgroup.json
  • 15:39 sukhe: re-enable puppet on lvs4010
  • 15:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 15:35 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 15:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P68168 and previous config saved to /var/cache/conftool/dbconfig/20240829-153410-ladsgroup.json
  • 15:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 15:33 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 15:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 15:30 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 15:29 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 15:29 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 15:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P68167 and previous config saved to /var/cache/conftool/dbconfig/20240829-152419-ladsgroup.json
  • 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T371742)', diff saved to https://phabricator.wikimedia.org/P68166 and previous config saved to /var/cache/conftool/dbconfig/20240829-152058-ladsgroup.json
  • 15:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 15:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T371742)', diff saved to https://phabricator.wikimedia.org/P68165 and previous config saved to /var/cache/conftool/dbconfig/20240829-152036-ladsgroup.json
  • 15:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T370903)', diff saved to https://phabricator.wikimedia.org/P68164 and previous config saved to /var/cache/conftool/dbconfig/20240829-151903-ladsgroup.json
  • 15:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-lab1002.eqiad.wmnet with OS bookworm
  • 15:16 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:16 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:11 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 15:10 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 15:10 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 15:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T370903)', diff saved to https://phabricator.wikimedia.org/P68163 and previous config saved to /var/cache/conftool/dbconfig/20240829-151000-ladsgroup.json
  • 15:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:09 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 15:09 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 15:09 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 15:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 15:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T370903)', diff saved to https://phabricator.wikimedia.org/P68162 and previous config saved to /var/cache/conftool/dbconfig/20240829-150846-ladsgroup.json
  • 15:08 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2297.codfw.wmnet
  • 15:07 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2297.codfw.wmnet
  • 15:07 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2296.codfw.wmnet
  • 15:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1001.eqiad.wmnet with OS bookworm
  • 15:07 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2296.codfw.wmnet
  • 15:07 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2295.codfw.wmnet
  • 15:07 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet
  • 15:06 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2295.codfw.wmnet
  • 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P68161 and previous config saved to /var/cache/conftool/dbconfig/20240829-150529-ladsgroup.json
  • 15:04 mutante: releases* - temp disable puppet, maintenance for java version upgrade
  • 15:04 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet
  • 15:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-sd1003.eqiad.wmnet with OS bookworm
  • 15:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:03 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-lab1002.eqiad.wmnet with reason: host reimage
  • 14:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-sd1002.eqiad.wmnet with OS bookworm
  • 14:59 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:59 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:58 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-lab1002.eqiad.wmnet with reason: host reimage
  • 14:56 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: testing T358260
  • 14:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-sd1004.eqiad.wmnet with OS bookworm
  • 14:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:56 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: testing T358260
  • 14:56 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2014.codfw.wmnet
  • 14:56 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2014.codfw.wmnet
  • 14:56 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2014.codfw.wmnet with reason: testing T358260
  • 14:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:55 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2014.codfw.wmnet with reason: testing T358260
  • 14:55 sukhe: downtiming lvs4010 to test T358260
  • 14:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-sd1001.eqiad.wmnet with OS bookworm
  • 14:53 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:52 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P68160 and previous config saved to /var/cache/conftool/dbconfig/20240829-145021-ladsgroup.json
  • 14:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-sd1003.eqiad.wmnet with reason: host reimage
  • 14:42 jgiannelos@deploy1003: Finished deploy [restbase/deploy@5a4727a]: (no justification provided) (duration: 16m 35s)
  • 14:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-sd1002.eqiad.wmnet with reason: host reimage
  • 14:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1002.eqiad.wmnet with OS bookworm
  • 14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
  • 14:38 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1002.eqiad.wmnet with OS bookworm
  • 14:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-sd1004.eqiad.wmnet with reason: host reimage
  • 14:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-sd1001.eqiad.wmnet with reason: host reimage
  • 14:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1001.eqiad.wmnet with OS bookworm
  • 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T371742)', diff saved to https://phabricator.wikimedia.org/P68159 and previous config saved to /var/cache/conftool/dbconfig/20240829-143514-ladsgroup.json
  • 14:32 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-sd1004.eqiad.wmnet with reason: host reimage
  • 14:32 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-sd1003.eqiad.wmnet with reason: host reimage
  • 14:32 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-sd1002.eqiad.wmnet with reason: host reimage
  • 14:32 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-sd1001.eqiad.wmnet with reason: host reimage
  • 14:25 jgiannelos@deploy1003: Started deploy [restbase/deploy@5a4727a]: (no justification provided)
  • 13:59 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2050.codfw.wmnet
  • 13:59 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2050.codfw.wmnet
  • 13:55 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P68154 and previous config saved to /var/cache/conftool/dbconfig/20240829-135537-ladsgroup.json
  • 13:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
  • 13:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
  • 13:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P68153 and previous config saved to /var/cache/conftool/dbconfig/20240829-135430-ladsgroup.json
  • 13:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1002.eqiad.wmnet with OS bookworm
  • 13:52 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1002.eqiad.wmnet with OS bookworm
  • 13:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host logging-sd1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host logging-sd1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host logging-sd1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:50 topranks: add qos interface schedulers on lsw1-d4-codfw T339850
  • 13:50 jclark@cumin1002: START - Cookbook sre.hosts.provision for host logging-sd1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:49 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:49 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt logging-sd1 - jclark@cumin1002"
  • 13:49 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt logging-sd1 - jclark@cumin1002"
  • 13:44 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P68152 and previous config saved to /var/cache/conftool/dbconfig/20240829-134030-ladsgroup.json
  • 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P68151 and previous config saved to /var/cache/conftool/dbconfig/20240829-133923-ladsgroup.json
  • 13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1002.eqiad.wmnet with OS bookworm
  • 13:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1001.eqiad.wmnet with OS bookworm
  • 13:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
  • 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T370903)', diff saved to https://phabricator.wikimedia.org/P68150 and previous config saved to /var/cache/conftool/dbconfig/20240829-132523-ladsgroup.json
  • 13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T371742)', diff saved to https://phabricator.wikimedia.org/P68149 and previous config saved to /var/cache/conftool/dbconfig/20240829-132416-ladsgroup.json
  • 13:13 samtar@deploy1003: Finished scap sync-world: Backport for Activate feature flag for moving wikibase item to Other Projects sidebar in pilot wikis. (duration: 10m 28s)
  • 13:08 samtar@deploy1003: joelyrookewmde, samtar: Continuing with sync
  • 13:06 samtar@deploy1003: joelyrookewmde, samtar: Backport for Activate feature flag for moving wikibase item to Other Projects sidebar in pilot wikis. synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:02 samtar@deploy1003: Started scap sync-world: Backport for Activate feature flag for moving wikibase item to Other Projects sidebar in pilot wikis.
  • 13:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T371742)', diff saved to https://phabricator.wikimedia.org/P68148 and previous config saved to /var/cache/conftool/dbconfig/20240829-130029-ladsgroup.json
  • 13:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 13:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 13:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T371742)', diff saved to https://phabricator.wikimedia.org/P68147 and previous config saved to /var/cache/conftool/dbconfig/20240829-130006-ladsgroup.json
  • 12:51 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@cb0bc4d]: Test Refine through Airflow (duration: 00m 09s)
  • 12:51 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@cb0bc4d]: Test Refine through Airflow
  • 12:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P68146 and previous config saved to /var/cache/conftool/dbconfig/20240829-124459-ladsgroup.json
  • 12:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P68145 and previous config saved to /var/cache/conftool/dbconfig/20240829-122951-ladsgroup.json
  • 12:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T370903)', diff saved to https://phabricator.wikimedia.org/P68144 and previous config saved to /var/cache/conftool/dbconfig/20240829-122527-ladsgroup.json
  • 12:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 12:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 12:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from test-s4 to test-s4
  • 12:22 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from test-s4 to test-s4
  • 12:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
  • 12:20 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
  • 12:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T371742)', diff saved to https://phabricator.wikimedia.org/P68143 and previous config saved to /var/cache/conftool/dbconfig/20240829-121444-ladsgroup.json
  • 12:10 hnowlan: homer 'lsw1-a3-codfw*' commit
  • 12:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.renumber-node (exit_code=0) Renumbering for host wikikube-worker2031.codfw.wmnet
  • 12:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2031.codfw.wmnet
  • 12:00 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2031.codfw.wmnet
  • 11:56 claime: homer lsw1-a6-codfw* commit 'T372878'
  • 11:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2031.codfw.wmnet with OS bullseye
  • 11:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T371742)', diff saved to https://phabricator.wikimedia.org/P68142 and previous config saved to /var/cache/conftool/dbconfig/20240829-115222-ladsgroup.json
  • 11:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 11:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from test-s4 to test-s4
  • 11:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 11:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T371742)', diff saved to https://phabricator.wikimedia.org/P68141 and previous config saved to /var/cache/conftool/dbconfig/20240829-115200-ladsgroup.json
  • 11:51 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2050.codfw.wmnet with OS bullseye
  • 11:51 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from test-s4 to test-s4
  • 11:51 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:51 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:44 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:43 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:41 topranks: modify qos configuration for asw2-ulsfo xe-2/0/18 (ganeti4006) to add traffic-control-profile T339850
  • 11:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
  • 11:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P68140 and previous config saved to /var/cache/conftool/dbconfig/20240829-113652-ladsgroup.json
  • 11:35 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
  • 11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2031.codfw.wmnet with reason: host reimage
  • 11:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
  • 11:34 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
  • 11:32 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2050.codfw.wmnet with reason: host reimage
  • 11:32 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
  • 11:32 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
  • 11:32 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
  • 11:32 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
  • 11:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
  • 11:31 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
  • 11:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
  • 11:31 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
  • 11:30 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2031.codfw.wmnet with reason: host reimage
  • 11:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
  • 11:30 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
  • 11:29 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2050.codfw.wmnet with reason: host reimage
  • 11:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
  • 11:24 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
  • 11:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
  • 11:22 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
  • 11:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P68139 and previous config saved to /var/cache/conftool/dbconfig/20240829-112145-ladsgroup.json
  • 11:17 claime: homer cr*codfw* commit 'T372878'
  • 11:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T370903)', diff saved to https://phabricator.wikimedia.org/P68138 and previous config saved to /var/cache/conftool/dbconfig/20240829-111351-ladsgroup.json
  • 11:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2050
  • 11:13 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2050
  • 11:13 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2050.codfw.wmnet with OS bullseye
  • 11:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f8d70a81b80>
  • 11:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2031
  • 11:11 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2031
  • 11:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2031.codfw.wmnet 179.0.192.10.in-addr.arpa 9.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 11:11 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2031.codfw.wmnet 179.0.192.10.in-addr.arpa 9.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 11:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2031 - cgoubert@cumin1002"
  • 11:10 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2031 - cgoubert@cumin1002"
  • 11:07 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 11:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T371742)', diff saved to https://phabricator.wikimedia.org/P68137 and previous config saved to /var/cache/conftool/dbconfig/20240829-110637-ladsgroup.json
  • 11:06 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f8d70a81b80>
  • 11:06 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2031.codfw.wmnet with OS bullseye
  • 11:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2031.codfw.wmnet
  • 11:02 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2031.codfw.wmnet
  • 11:02 cgoubert@cumin1002: START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2031.codfw.wmnet
  • 10:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P68136 and previous config saved to /var/cache/conftool/dbconfig/20240829-105844-ladsgroup.json
  • 10:56 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp-test2005.wikimedia.org
  • 10:56 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test2005.wikimedia.org with OS bookworm
  • 10:49 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2050.codfw.wmnet with OS bullseye
  • 10:49 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2050.codfw.wmnet with OS bullseye
  • 10:48 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2050.codfw.wmnet on all recursors
  • 10:48 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2050.codfw.wmnet on all recursors
  • 10:48 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2050.codfw.wmnet with OS bullseye
  • 10:48 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2050.codfw.wmnet with OS bullseye
  • 10:47 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2380 to wikikube-worker2050
  • 10:46 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2050
  • 10:46 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2050
  • 10:46 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:46 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2380 to wikikube-worker2050 - hnowlan@cumin1002"
  • 10:44 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2380 to wikikube-worker2050 - hnowlan@cumin1002"
  • 10:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P68134 and previous config saved to /var/cache/conftool/dbconfig/20240829-104336-ladsgroup.json
  • 10:38 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 10:37 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2380 to wikikube-worker2050
  • 10:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1183 (T371742)', diff saved to https://phabricator.wikimedia.org/P68133 and previous config saved to /var/cache/conftool/dbconfig/20240829-103724-ladsgroup.json
  • 10:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 10:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.renumber-node (exit_code=0) Renumbering for host wikikube-worker2010.codfw.wmnet
  • 10:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2010.codfw.wmnet
  • 10:37 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2010.codfw.wmnet
  • 10:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 10:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T371742)', diff saved to https://phabricator.wikimedia.org/P68132 and previous config saved to /var/cache/conftool/dbconfig/20240829-103702-ladsgroup.json
  • 10:36 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@cb0bc4d]: Test Refine through Airflow (duration: 00m 10s)
  • 10:36 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@cb0bc4d]: Test Refine through Airflow
  • 10:34 claime: homer lsw1-b6-codfw* commit 'T372878'
  • 10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2010.codfw.wmnet with OS bullseye
  • 10:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 10:29 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 10:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T370903)', diff saved to https://phabricator.wikimedia.org/P68131 and previous config saved to /var/cache/conftool/dbconfig/20240829-102829-ladsgroup.json
  • 10:23 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 10:23 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 10:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P68130 and previous config saved to /var/cache/conftool/dbconfig/20240829-102155-ladsgroup.json
  • 10:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2010.codfw.wmnet with reason: host reimage
  • 10:09 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2010.codfw.wmnet with reason: host reimage
  • 10:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P68128 and previous config saved to /var/cache/conftool/dbconfig/20240829-100648-ladsgroup.json
  • 10:05 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2048.codfw.wmnet on all recursors
  • 10:05 akosiaris@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2048.codfw.wmnet on all recursors
  • 10:02 claime: homer cr*codfw* commit 'T372878'
  • 10:01 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2048.codfw.wmnet
  • 10:01 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2048.codfw.wmnet
  • 10:00 akosiaris: T372878 wikikube-worker2048.codfw.wmnet updated in netbox and homer running
  • 09:58 topranks: apply qos classifers and scedulers to interfaces on ulsfo CRs T339850
  • 09:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7fa8c9ceef40>
  • 09:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2010
  • 09:52 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2010
  • 09:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2010.codfw.wmnet 198.16.192.10.in-addr.arpa 8.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:52 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2010.codfw.wmnet 198.16.192.10.in-addr.arpa 8.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2010 - cgoubert@cumin1002"
  • 09:52 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2010 - cgoubert@cumin1002"
  • 09:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T371742)', diff saved to https://phabricator.wikimedia.org/P68127 and previous config saved to /var/cache/conftool/dbconfig/20240829-095141-ladsgroup.json
  • 09:48 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 09:48 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7fa8c9ceef40>
  • 09:48 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2010.codfw.wmnet with OS bullseye
  • 09:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2010.codfw.wmnet
  • 09:46 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2010.codfw.wmnet
  • 09:46 cgoubert@cumin1002: START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2010.codfw.wmnet
  • 09:44 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:44 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:44 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:43 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:32 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
  • 09:32 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
  • 09:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T370903)', diff saved to https://phabricator.wikimedia.org/P68126 and previous config saved to /var/cache/conftool/dbconfig/20240829-092819-ladsgroup.json
  • 09:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 09:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 09:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T371742)', diff saved to https://phabricator.wikimedia.org/P68125 and previous config saved to /var/cache/conftool/dbconfig/20240829-092547-ladsgroup.json
  • 09:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 09:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 09:24 topranks: apply qos classifers and scedulers to interfaces on asw2-ulsfo T339850
  • 09:24 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "idp-test2005 - ayounsi@cumin1002"
  • 09:24 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "idp-test2005 - ayounsi@cumin1002"
  • 09:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test2005.wikimedia.org with reason: host reimage
  • 09:14 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2380.codfw.wmnet
  • 09:13 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2380.codfw.wmnet
  • 09:13 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test2005.wikimedia.org with reason: host reimage
  • 09:06 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@cb0bc4d]: Test Refine through Airflow (duration: 00m 11s)
  • 09:06 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@cb0bc4d]: Test Refine through Airflow
  • 08:59 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS bookworm
  • 08:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2005.wikimedia.org - ayounsi@cumin1002"
  • 08:58 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2005.wikimedia.org - ayounsi@cumin1002"
  • 08:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp-test2005.wikimedia.org on all recursors
  • 08:58 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache idp-test2005.wikimedia.org on all recursors
  • 08:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2005.wikimedia.org - ayounsi@cumin1002"
  • 08:58 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2005.wikimedia.org - ayounsi@cumin1002"
  • 08:51 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 08:51 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host idp-test2005.wikimedia.org
  • 08:41 hashar@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.20 refs T366965
  • 07:53 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1011.eqiad.wmnet
  • 07:47 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host snapshot1011.eqiad.wmnet
  • 07:46 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host snapshot1011.eqiad.wmnet
  • 07:46 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host snapshot1011.eqiad.wmnet
  • 07:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: Testing
  • 07:39 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: Testing
  • 07:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T371742)', diff saved to https://phabricator.wikimedia.org/P68124 and previous config saved to /var/cache/conftool/dbconfig/20240829-070017-ladsgroup.json
  • 06:55 kcvelaga@deploy1003: Finished deploy [airflow-dags/analytics_product@cb0bc4d]: (no justification provided) (duration: 00m 03s)
  • 06:55 kcvelaga@deploy1003: Started deploy [airflow-dags/analytics_product@cb0bc4d]: (no justification provided)
  • 06:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P68123 and previous config saved to /var/cache/conftool/dbconfig/20240829-064508-ladsgroup.json
  • 06:30 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@cb0bc4d]: Test Refine through Airflow (duration: 00m 10s)
  • 06:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P68122 and previous config saved to /var/cache/conftool/dbconfig/20240829-063000-ladsgroup.json
  • 06:29 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@cb0bc4d]: Test Refine through Airflow
  • 06:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T371742)', diff saved to https://phabricator.wikimedia.org/P68121 and previous config saved to /var/cache/conftool/dbconfig/20240829-061453-ladsgroup.json
  • 04:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T371742)', diff saved to https://phabricator.wikimedia.org/P68120 and previous config saved to /var/cache/conftool/dbconfig/20240829-041348-ladsgroup.json
  • 04:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 04:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 04:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T371742)', diff saved to https://phabricator.wikimedia.org/P68119 and previous config saved to /var/cache/conftool/dbconfig/20240829-041326-ladsgroup.json
  • 03:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P68118 and previous config saved to /var/cache/conftool/dbconfig/20240829-035817-ladsgroup.json
  • 03:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P68117 and previous config saved to /var/cache/conftool/dbconfig/20240829-034310-ladsgroup.json
  • 03:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T371742)', diff saved to https://phabricator.wikimedia.org/P68116 and previous config saved to /var/cache/conftool/dbconfig/20240829-032803-ladsgroup.json
  • 01:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T371742)', diff saved to https://phabricator.wikimedia.org/P68115 and previous config saved to /var/cache/conftool/dbconfig/20240829-012759-ladsgroup.json
  • 01:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 01:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 01:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T371742)', diff saved to https://phabricator.wikimedia.org/P68114 and previous config saved to /var/cache/conftool/dbconfig/20240829-012736-ladsgroup.json
  • 01:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P68113 and previous config saved to /var/cache/conftool/dbconfig/20240829-011229-ladsgroup.json
  • 00:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P68112 and previous config saved to /var/cache/conftool/dbconfig/20240829-005722-ladsgroup.json
  • 00:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T371742)', diff saved to https://phabricator.wikimedia.org/P68111 and previous config saved to /var/cache/conftool/dbconfig/20240829-004215-ladsgroup.json
  • 00:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T370903)', diff saved to https://phabricator.wikimedia.org/P68110 and previous config saved to /var/cache/conftool/dbconfig/20240829-001215-ladsgroup.json

2024-08-28

  • 23:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P68109 and previous config saved to /var/cache/conftool/dbconfig/20240828-235708-ladsgroup.json
  • 23:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P68108 and previous config saved to /var/cache/conftool/dbconfig/20240828-234201-ladsgroup.json
  • 23:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T370903)', diff saved to https://phabricator.wikimedia.org/P68107 and previous config saved to /var/cache/conftool/dbconfig/20240828-232653-ladsgroup.json
  • 23:10 eileen: config revision changed from cb9b3655 to af0aadef re-enable dedupe contacts from start
  • 23:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T370903)', diff saved to https://phabricator.wikimedia.org/P68106 and previous config saved to /var/cache/conftool/dbconfig/20240828-230748-ladsgroup.json
  • 23:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 23:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 23:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T370903)', diff saved to https://phabricator.wikimedia.org/P68105 and previous config saved to /var/cache/conftool/dbconfig/20240828-230726-ladsgroup.json
  • 22:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P68104 and previous config saved to /var/cache/conftool/dbconfig/20240828-225218-ladsgroup.json
  • 22:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P68103 and previous config saved to /var/cache/conftool/dbconfig/20240828-223711-ladsgroup.json
  • 22:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T371742)', diff saved to https://phabricator.wikimedia.org/P68102 and previous config saved to /var/cache/conftool/dbconfig/20240828-223325-ladsgroup.json
  • 22:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 22:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 22:23 swfrench-wmf: running homer 'cr*codfw*' commit 'T372878'
  • 22:22 ryankemper: [WDQS] `ryankemper@wdqs1015:~$ sudo systemctl restart wdqs-blazegraph`
  • 22:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T370903)', diff saved to https://phabricator.wikimedia.org/P68101 and previous config saved to /var/cache/conftool/dbconfig/20240828-222204-ladsgroup.json
  • 22:17 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2049.codfw.wmnet
  • 22:17 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2049.codfw.wmnet
  • 22:14 swfrench-wmf: running homer 'lsw1-b3-codfw*' commit 'T372878'
  • 22:11 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2049.codfw.wmnet with OS bullseye
  • 22:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T370903)', diff saved to https://phabricator.wikimedia.org/P68100 and previous config saved to /var/cache/conftool/dbconfig/20240828-220318-ladsgroup.json
  • 22:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 22:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 22:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T370903)', diff saved to https://phabricator.wikimedia.org/P68099 and previous config saved to /var/cache/conftool/dbconfig/20240828-220256-ladsgroup.json
  • 21:50 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2049.codfw.wmnet with reason: host reimage
  • 21:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P68098 and previous config saved to /var/cache/conftool/dbconfig/20240828-214749-ladsgroup.json
  • 21:46 swfrench@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2049.codfw.wmnet with reason: host reimage
  • 21:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 21:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 21:39 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 21:39 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 21:33 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 21:33 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 21:32 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 21:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P68097 and previous config saved to /var/cache/conftool/dbconfig/20240828-213242-ladsgroup.json
  • 21:32 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 21:31 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 21:30 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 21:29 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2049
  • 21:29 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2049
  • 21:28 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2049
  • 21:28 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2049.codfw.wmnet 59.16.192.10.in-addr.arpa 9.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 21:28 swfrench@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-worker2049.codfw.wmnet 59.16.192.10.in-addr.arpa 9.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 21:28 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:28 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2049 - swfrench@cumin2002"
  • 21:28 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2049 - swfrench@cumin2002"
  • 21:26 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 21:26 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 21:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 21:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 21:24 swfrench@cumin2002: START - Cookbook sre.dns.netbox
  • 21:23 swfrench@cumin2002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2049
  • 21:23 swfrench@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2049.codfw.wmnet with OS bullseye
  • 21:22 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2049.codfw.wmnet on all recursors
  • 21:22 swfrench@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-worker2049.codfw.wmnet on all recursors
  • 21:21 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2029 to wikikube-worker2049
  • 21:20 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2049
  • 21:20 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2049
  • 21:20 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:20 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2029 to wikikube-worker2049 - swfrench@cumin2002"
  • 21:20 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2029 to wikikube-worker2049 - swfrench@cumin2002"
  • 21:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T370903)', diff saved to https://phabricator.wikimedia.org/P68096 and previous config saved to /var/cache/conftool/dbconfig/20240828-211734-ladsgroup.json
  • 21:16 swfrench@cumin2002: START - Cookbook sre.dns.netbox
  • 21:15 swfrench@cumin2002: START - Cookbook sre.hosts.rename from kubernetes2029 to wikikube-worker2049
  • 21:10 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2029.codfw.wmnet
  • 21:10 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2029.codfw.wmnet
  • 20:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T370903)', diff saved to https://phabricator.wikimedia.org/P68095 and previous config saved to /var/cache/conftool/dbconfig/20240828-205834-ladsgroup.json
  • 20:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 20:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 20:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T370903)', diff saved to https://phabricator.wikimedia.org/P68094 and previous config saved to /var/cache/conftool/dbconfig/20240828-205812-ladsgroup.json
  • 20:54 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 20:53 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 20:51 cjming: end of UTC late backport window
  • 20:49 cjming@deploy1003: Finished scap sync-world: Backport for auth: Relax AuthManager session state check while cde00b55 is deployed (T373504), Fix missing definition of setSaveErrorMessage too (T373288), CentralAuthApiSessionProvider: Avoid error in internal API requests (T373507) (duration: 11m 31s)
  • 20:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 20:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T371742)', diff saved to https://phabricator.wikimedia.org/P68093 and previous config saved to /var/cache/conftool/dbconfig/20240828-204715-ladsgroup.json
  • 20:44 cjming@deploy1003: matmarex, cjming: Continuing with sync
  • 20:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P68092 and previous config saved to /var/cache/conftool/dbconfig/20240828-204305-ladsgroup.json
  • 20:39 cjming@deploy1003: matmarex, cjming: Backport for auth: Relax AuthManager session state check while cde00b55 is deployed (T373504), Fix missing definition of setSaveErrorMessage too (T373288), CentralAuthApiSessionProvider: Avoid error in internal API requests (T373507) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:37 cjming@deploy1003: Started scap sync-world: Backport for auth: Relax AuthManager session state check while cde00b55 is deployed (T373504), Fix missing definition of setSaveErrorMessage too (T373288), CentralAuthApiSessionProvider: Avoid error in internal API requests (T373507)
  • 20:35 cjming@deploy1003: Finished scap sync-world: Backport for Disable HLS VP9 video tracks in TimedMediaHandler (T373546) (duration: 08m 10s)
  • 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P68091 and previous config saved to /var/cache/conftool/dbconfig/20240828-203208-ladsgroup.json
  • 20:31 cjming@deploy1003: bvibber, cjming: Continuing with sync
  • 20:29 cjming@deploy1003: bvibber, cjming: Backport for Disable HLS VP9 video tracks in TimedMediaHandler (T373546) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P68090 and previous config saved to /var/cache/conftool/dbconfig/20240828-202757-ladsgroup.json
  • 20:27 cjming@deploy1003: Started scap sync-world: Backport for Disable HLS VP9 video tracks in TimedMediaHandler (T373546)
  • 20:26 cjming@deploy1003: Finished scap sync-world: Backport for logging: Use '??=' operator to reduce repetition (duration: 06m 39s)
  • 20:21 cjming@deploy1003: cjming, matmarex: Continuing with sync
  • 20:21 cjming@deploy1003: cjming, matmarex: Backport for logging: Use '??=' operator to reduce repetition synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:19 cjming@deploy1003: Started scap sync-world: Backport for logging: Use '??=' operator to reduce repetition
  • 20:17 cjming@deploy1003: Finished scap sync-world: Backport for Lift IP cap on this dates 10/09, 17/09, 24/09 for edit-a-thon for eswiki, commons and wikidata (T373468) (duration: 11m 02s)
  • 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P68089 and previous config saved to /var/cache/conftool/dbconfig/20240828-201701-ladsgroup.json
  • 20:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T370903)', diff saved to https://phabricator.wikimedia.org/P68088 and previous config saved to /var/cache/conftool/dbconfig/20240828-201250-ladsgroup.json
  • 20:12 cjming@deploy1003: cjming, gergesshamon: Continuing with sync
  • 20:09 cjming@deploy1003: cjming, gergesshamon: Backport for Lift IP cap on this dates 10/09, 17/09, 24/09 for edit-a-thon for eswiki, commons and wikidata (T373468) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:06 cjming@deploy1003: Started scap sync-world: Backport for Lift IP cap on this dates 10/09, 17/09, 24/09 for edit-a-thon for eswiki, commons and wikidata (T373468)
  • 20:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T371742)', diff saved to https://phabricator.wikimedia.org/P68087 and previous config saved to /var/cache/conftool/dbconfig/20240828-200154-ladsgroup.json
  • 19:59 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 19:58 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 19:54 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 19:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T370903)', diff saved to https://phabricator.wikimedia.org/P68086 and previous config saved to /var/cache/conftool/dbconfig/20240828-195401-ladsgroup.json
  • 19:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 19:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T370903)', diff saved to https://phabricator.wikimedia.org/P68085 and previous config saved to /var/cache/conftool/dbconfig/20240828-195339-ladsgroup.json
  • 19:51 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P68084 and previous config saved to /var/cache/conftool/dbconfig/20240828-193832-ladsgroup.json
  • 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P68083 and previous config saved to /var/cache/conftool/dbconfig/20240828-192325-ladsgroup.json
  • 19:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T370903)', diff saved to https://phabricator.wikimedia.org/P68082 and previous config saved to /var/cache/conftool/dbconfig/20240828-190817-ladsgroup.json
  • 19:02 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 18:59 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2048.codfw.wmnet with OS bullseye
  • 18:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T370903)', diff saved to https://phabricator.wikimedia.org/P68081 and previous config saved to /var/cache/conftool/dbconfig/20240828-184950-ladsgroup.json
  • 18:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T370903)', diff saved to https://phabricator.wikimedia.org/P68080 and previous config saved to /var/cache/conftool/dbconfig/20240828-184923-ladsgroup.json
  • 18:39 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2048.codfw.wmnet with reason: host reimage
  • 18:36 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2048.codfw.wmnet with reason: host reimage
  • 18:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P68079 and previous config saved to /var/cache/conftool/dbconfig/20240828-183416-ladsgroup.json
  • 18:19 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2048
  • 18:19 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2048
  • 18:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P68078 and previous config saved to /var/cache/conftool/dbconfig/20240828-181908-ladsgroup.json
  • 18:18 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2048
  • 18:18 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2048.codfw.wmnet 164.0.192.10.in-addr.arpa 4.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 18:18 akosiaris@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2048.codfw.wmnet 164.0.192.10.in-addr.arpa 4.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 18:18 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:18 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2048 - akosiaris@cumin1002"
  • 18:18 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2048 - akosiaris@cumin1002"
  • 18:16 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2045.codfw.wmnet
  • 18:16 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2045.codfw.wmnet
  • 18:15 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 18:14 akosiaris@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2048
  • 18:14 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2048.codfw.wmnet with OS bullseye
  • 18:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2294 to wikikube-worker2048
  • 18:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2048
  • 18:08 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2048
  • 18:08 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:08 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2294 to wikikube-worker2048 - akosiaris@cumin1002"
  • 18:08 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2294 to wikikube-worker2048 - akosiaris@cumin1002"
  • 18:04 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 18:04 akosiaris@cumin1002: START - Cookbook sre.hosts.rename from mw2294 to wikikube-worker2048
  • 18:04 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2045.codfw.wmnet with OS bullseye
  • 18:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T370903)', diff saved to https://phabricator.wikimedia.org/P68077 and previous config saved to /var/cache/conftool/dbconfig/20240828-180401-ladsgroup.json
  • 18:00 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1001.eqiad.wmnet with OS bookworm
  • 17:57 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2294.codfw.wmnet
  • 17:57 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2294.codfw.wmnet
  • 17:48 ejegg: fundraising civicrm upgraded from e3aead7d to 916cad45
  • 17:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2179 (T371742)', diff saved to https://phabricator.wikimedia.org/P68076 and previous config saved to /var/cache/conftool/dbconfig/20240828-174811-ladsgroup.json
  • 17:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 17:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 17:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T371742)', diff saved to https://phabricator.wikimedia.org/P68075 and previous config saved to /var/cache/conftool/dbconfig/20240828-174749-ladsgroup.json
  • 17:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T370903)', diff saved to https://phabricator.wikimedia.org/P68074 and previous config saved to /var/cache/conftool/dbconfig/20240828-174514-ladsgroup.json
  • 17:45 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2045.codfw.wmnet with reason: host reimage
  • 17:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 17:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 17:42 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
  • 17:42 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2045.codfw.wmnet with reason: host reimage
  • 17:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P68073 and previous config saved to /var/cache/conftool/dbconfig/20240828-173242-ladsgroup.json
  • 17:30 kcvelaga@deploy1003: Finished deploy [airflow-dags/analytics_product@cb0bc4d]: (no justification provided) (duration: 00m 18s)
  • 17:29 kcvelaga@deploy1003: Started deploy [airflow-dags/analytics_product@cb0bc4d]: (no justification provided)
  • 17:27 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T370903)', diff saved to https://phabricator.wikimedia.org/P68072 and previous config saved to /var/cache/conftool/dbconfig/20240828-172653-ladsgroup.json
  • 17:24 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2045.codfw.wmnet with OS bullseye
  • 17:22 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:22 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:22 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P68071 and previous config saved to /var/cache/conftool/dbconfig/20240828-171735-ladsgroup.json
  • 17:14 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P68070 and previous config saved to /var/cache/conftool/dbconfig/20240828-171146-ladsgroup.json
  • 17:05 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:03 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:02 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T371742)', diff saved to https://phabricator.wikimedia.org/P68069 and previous config saved to /var/cache/conftool/dbconfig/20240828-170228-ladsgroup.json
  • 17:02 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:01 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:00 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:59 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:59 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P68068 and previous config saved to /var/cache/conftool/dbconfig/20240828-165638-ladsgroup.json
  • 16:51 topranks: add qos config to management firewalls T339850
  • 16:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1001.eqiad.wmnet with OS bookworm
  • 16:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
  • 16:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T370903)', diff saved to https://phabricator.wikimedia.org/P68067 and previous config saved to /var/cache/conftool/dbconfig/20240828-164131-ladsgroup.json
  • 16:38 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1001.eqiad.wmnet with OS bookworm
  • 16:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2009.codfw.wmnet
  • 16:35 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2009.codfw.wmnet
  • 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
  • 16:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2009.codfw.wmnet
  • 16:32 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2009.codfw.wmnet
  • 16:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2009.codfw.wmnet
  • 16:30 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2009.codfw.wmnet
  • 16:26 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:26 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:24 hnowlan@deploy1003: Finished scap sync-world: Backport for timedmediahandler: revert using shellbox for commonswiki (T373517) (duration: 07m 13s)
  • 16:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2127 (T370903)', diff saved to https://phabricator.wikimedia.org/P68066 and previous config saved to /var/cache/conftool/dbconfig/20240828-162239-ladsgroup.json
  • 16:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 16:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 16:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1009.eqiad.wmnet with OS bookworm
  • 16:20 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 16:20 hnowlan@deploy1003: hnowlan: Continuing with sync
  • 16:20 hnowlan@deploy1003: hnowlan: Backport for timedmediahandler: revert using shellbox for commonswiki (T373517) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:17 hnowlan@deploy1003: Started scap sync-world: Backport for timedmediahandler: revert using shellbox for commonswiki (T373517)
  • 16:17 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:07 cgoubert@cumin1002: END (FAIL) - Cookbook sre.k8s.renumber-node (exit_code=99) Renumbering for host wikikube-worker2009.codfw.wmnet
  • 16:06 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 16:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 16:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 16:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T370903)', diff saved to https://phabricator.wikimedia.org/P68065 and previous config saved to /var/cache/conftool/dbconfig/20240828-160354-ladsgroup.json
  • 16:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1009.eqiad.wmnet with reason: host reimage
  • 16:02 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 16:01 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 16:01 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 16:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1001.eqiad.wmnet with OS bookworm
  • 15:59 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1009.eqiad.wmnet with reason: host reimage
  • 15:52 urandom: TRUNCATE-ing RESTBase tables (`{commons,enwiki,others,wikipedia}_T_mobileoZCBVtILw5eSrwi0VIGaFVSr2jY`) — T342148
  • 15:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
  • 15:49 claime: homer lsw1-b6-codfw* commit 'T372878'
  • 15:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P68063 and previous config saved to /var/cache/conftool/dbconfig/20240828-154846-ladsgroup.json
  • 15:47 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@0b23c91]: Test Refine through Airflow (duration: 00m 11s)
  • 15:47 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1009.eqiad.wmnet with OS bookworm
  • 15:47 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@0b23c91]: Test Refine through Airflow
  • 15:45 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1001.eqiad.wmnet with OS bookworm
  • 15:40 claime: homer cr*codfw* commit 'T372878'
  • 15:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2009.codfw.wmnet with OS bullseye
  • 15:34 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:34 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P68062 and previous config saved to /var/cache/conftool/dbconfig/20240828-153338-ladsgroup.json
  • 15:23 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/toolhub: sync
  • 15:23 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/toolhub: sync
  • 15:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1011.eqiad.wmnet with OS bookworm
  • 15:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:22 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/toolhub: sync
  • 15:22 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/toolhub: sync
  • 15:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2009.codfw.wmnet with reason: host reimage
  • 15:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T370903)', diff saved to https://phabricator.wikimedia.org/P68061 and previous config saved to /var/cache/conftool/dbconfig/20240828-151831-ladsgroup.json
  • 15:18 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 15:17 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 15:16 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2009.codfw.wmnet with reason: host reimage
  • 15:14 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 15:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1223 (T370903)', diff saved to https://phabricator.wikimedia.org/P68060 and previous config saved to /var/cache/conftool/dbconfig/20240828-151404-ladsgroup.json
  • 15:14 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 15:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 15:13 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 15:13 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 15:13 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 15:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T370903)', diff saved to https://phabricator.wikimedia.org/P68059 and previous config saved to /var/cache/conftool/dbconfig/20240828-151342-ladsgroup.json
  • 15:11 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 15:11 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-lab1002
  • 15:11 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-lab1002
  • 15:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1011.eqiad.wmnet with reason: host reimage
  • 15:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1011.eqiad.wmnet with reason: host reimage
  • 14:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f9ac7a901f0>
  • 14:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2009
  • 14:59 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2009
  • 14:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2009.codfw.wmnet 197.16.192.10.in-addr.arpa 7.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:59 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2009.codfw.wmnet 197.16.192.10.in-addr.arpa 7.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2009 - cgoubert@cumin1002"
  • 14:59 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2009 - cgoubert@cumin1002"
  • 14:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P68058 and previous config saved to /var/cache/conftool/dbconfig/20240828-145835-ladsgroup.json
  • 14:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T371742)', diff saved to https://phabricator.wikimedia.org/P68057 and previous config saved to /var/cache/conftool/dbconfig/20240828-145651-ladsgroup.json
  • 14:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 14:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 14:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T371742)', diff saved to https://phabricator.wikimedia.org/P68056 and previous config saved to /var/cache/conftool/dbconfig/20240828-145629-ladsgroup.json
  • 14:55 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:55 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f9ac7a901f0>
  • 14:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2009.codfw.wmnet with OS bullseye
  • 14:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2009.codfw.wmnet
  • 14:54 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2009.codfw.wmnet
  • 14:54 cgoubert@cumin1002: START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2009.codfw.wmnet
  • 14:50 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1011.eqiad.wmnet with OS bookworm
  • 14:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P68054 and previous config saved to /var/cache/conftool/dbconfig/20240828-144328-ladsgroup.json
  • 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P68053 and previous config saved to /var/cache/conftool/dbconfig/20240828-144122-ladsgroup.json
  • 14:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1010.eqiad.wmnet with OS bookworm
  • 14:36 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:35 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T370903)', diff saved to https://phabricator.wikimedia.org/P68052 and previous config saved to /var/cache/conftool/dbconfig/20240828-142821-ladsgroup.json
  • 14:26 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:26 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P68051 and previous config saved to /var/cache/conftool/dbconfig/20240828-142615-ladsgroup.json
  • 14:25 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1001.eqiad.wmnet with OS bookworm
  • 14:24 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T370903)', diff saved to https://phabricator.wikimedia.org/P68050 and previous config saved to /var/cache/conftool/dbconfig/20240828-142355-ladsgroup.json
  • 14:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 14:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 14:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T370903)', diff saved to https://phabricator.wikimedia.org/P68049 and previous config saved to /var/cache/conftool/dbconfig/20240828-142315-ladsgroup.json
  • 14:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1010.eqiad.wmnet with reason: host reimage
  • 14:20 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:19 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:19 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:19 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 14:18 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:18 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1010.eqiad.wmnet with reason: host reimage
  • 14:00 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 14:00 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 13:59 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 13:59 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 13:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1010.eqiad.wmnet with OS bookworm
  • 13:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P68046 and previous config saved to /var/cache/conftool/dbconfig/20240828-135300-ladsgroup.json
  • 13:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2009.codfw.wmnet
  • 13:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2009.codfw.wmnet
  • 13:45 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 13:40 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 13:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s1 to test-s1
  • 13:38 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s1 to test-s1
  • 13:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T370903)', diff saved to https://phabricator.wikimedia.org/P68045 and previous config saved to /var/cache/conftool/dbconfig/20240828-133753-ladsgroup.json
  • 13:36 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the switch from test-s1 to test-s1
  • 13:36 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s1 to test-s1
  • 13:36 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 13:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T370903)', diff saved to https://phabricator.wikimedia.org/P68044 and previous config saved to /var/cache/conftool/dbconfig/20240828-133346-ladsgroup.json
  • 13:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T370903)', diff saved to https://phabricator.wikimedia.org/P68043 and previous config saved to /var/cache/conftool/dbconfig/20240828-133323-ladsgroup.json
  • 13:31 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the switch from test-s1 to test-s1
  • 13:31 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s1 to test-s1
  • 13:31 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 13:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P68042 and previous config saved to /var/cache/conftool/dbconfig/20240828-131815-ladsgroup.json
  • 13:10 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 13:10 elukey@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 13:04 topranks: rolling out config additions of qos schedulers and policers to all network devices T339850
  • 13:03 godog: delete 2023 5m blocks from thanos - T351927
  • 13:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P68041 and previous config saved to /var/cache/conftool/dbconfig/20240828-130308-ladsgroup.json
  • 12:58 Dreamy_Jazz: Started MediaModeration scan on enwiki, time limited to 24hrs - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 12:57 sukhe: sudo ipmitool -I lanplus -H "puppetserver1002.mgmt.eqiad.wmnet" -U root -E chassis power cycle
  • 12:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T370903)', diff saved to https://phabricator.wikimedia.org/P68040 and previous config saved to /var/cache/conftool/dbconfig/20240828-124801-ladsgroup.json
  • 12:45 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 12:44 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 12:41 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 12:40 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 12:39 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:39 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from test-s1 to test-s1
  • 12:29 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from test-s1 to test-s1
  • 12:28 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.finalize (exit_code=99) for the switch from test-s1 to test-s1
  • 12:28 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from test-s1 to test-s1
  • 12:27 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.finalize (exit_code=99) for the switch from test-s1 to test-s1
  • 12:27 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from test-s1 to test-s1
  • 12:23 MichaelG_WMF: T371228 running foreachwikiindblist growthexperiments ./extensions/CommunityConfiguration/maintenance/setVersionData.php HelpPanel 1.0.0
  • 12:22 arnaudb@cumin1002: END (ERROR) - Cookbook sre.switchdc.databases.finalize (exit_code=97) for the switch from test-s1 to test-s1
  • 12:22 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from test-s1 to test-s1
  • 12:19 MichaelG_WMF: T371228 running mwscript --wiki testwiki ./extensions/CommunityConfiguration/maintenance/setVersionData.php HelpPanel 1.0.0
  • 11:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T371742)', diff saved to https://phabricator.wikimedia.org/P68039 and previous config saved to /var/cache/conftool/dbconfig/20240828-115123-ladsgroup.json
  • 11:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T371742)', diff saved to https://phabricator.wikimedia.org/P68038 and previous config saved to /var/cache/conftool/dbconfig/20240828-115057-ladsgroup.json
  • 11:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T370903)', diff saved to https://phabricator.wikimedia.org/P68037 and previous config saved to /var/cache/conftool/dbconfig/20240828-114745-ladsgroup.json
  • 11:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 11:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 11:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T370903)', diff saved to https://phabricator.wikimedia.org/P68036 and previous config saved to /var/cache/conftool/dbconfig/20240828-114722-ladsgroup.json
  • 11:44 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 11:43 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 11:43 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:42 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:42 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:42 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:41 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:40 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P68035 and previous config saved to /var/cache/conftool/dbconfig/20240828-113549-ladsgroup.json
  • 11:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P68034 and previous config saved to /var/cache/conftool/dbconfig/20240828-113215-ladsgroup.json
  • 11:23 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 11:22 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 11:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P68033 and previous config saved to /var/cache/conftool/dbconfig/20240828-112042-ladsgroup.json
  • 11:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P68032 and previous config saved to /var/cache/conftool/dbconfig/20240828-111708-ladsgroup.json
  • 11:17 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 11:14 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:12 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:10 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Maintain ranked order of candidates in STV vote summary (T373499) (duration: 06m 44s)
  • 11:06 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 11:06 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 11:06 dreamyjazz@deploy1003: dreamyjazz: Backport for Maintain ranked order of candidates in STV vote summary (T373499) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T371742)', diff saved to https://phabricator.wikimedia.org/P68031 and previous config saved to /var/cache/conftool/dbconfig/20240828-110535-ladsgroup.json
  • 11:03 dreamyjazz@deploy1003: Started scap sync-world: Backport for Maintain ranked order of candidates in STV vote summary (T373499)
  • 11:02 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 11:02 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 11:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T370903)', diff saved to https://phabricator.wikimedia.org/P68030 and previous config saved to /var/cache/conftool/dbconfig/20240828-110200-ladsgroup.json
  • 10:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T370903)', diff saved to https://phabricator.wikimedia.org/P68029 and previous config saved to /var/cache/conftool/dbconfig/20240828-105757-ladsgroup.json
  • 10:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T370903)', diff saved to https://phabricator.wikimedia.org/P68028 and previous config saved to /var/cache/conftool/dbconfig/20240828-105735-ladsgroup.json
  • 10:50 cmooney@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Relase v0.7.0 with updated plugin - cmooney@cumin1002
  • 10:48 ladsgroup@deploy1003: Finished scap sync-world: Backport for Set ruwiki to non simple UI (T372694) (duration: 10m 48s)
  • 10:44 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 10:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P68027 and previous config saved to /var/cache/conftool/dbconfig/20240828-104228-ladsgroup.json
  • 10:42 ladsgroup@deploy1003: ladsgroup: Backport for Set ruwiki to non simple UI (T372694) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:41 godog: start prometheus2005 bookworm upgrade - T326657
  • 10:40 cmooney@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Relase v0.7.0 with updated plugin - cmooney@cumin1002
  • 10:38 ladsgroup@deploy1003: Started scap sync-world: Backport for Set ruwiki to non simple UI (T372694)
  • 10:38 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
  • 10:27 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
  • 10:27 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 10:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P68026 and previous config saved to /var/cache/conftool/dbconfig/20240828-102721-ladsgroup.json
  • 10:24 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 10:12 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the switch from test-s1 to test-s1
  • 10:12 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s1 to test-s1
  • 10:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T370903)', diff saved to https://phabricator.wikimedia.org/P68025 and previous config saved to /var/cache/conftool/dbconfig/20240828-101214-ladsgroup.json
  • 10:11 arnaudb@cumin1002: END (ERROR) - Cookbook sre.switchdc.databases.prepare (exit_code=97) for the switch from test-s1 to test-s1
  • 10:11 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s1 to test-s1
  • 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1157 (T370903)', diff saved to https://phabricator.wikimedia.org/P68024 and previous config saved to /var/cache/conftool/dbconfig/20240828-100803-ladsgroup.json
  • 10:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:07 arnaudb@cumin1002: END (ERROR) - Cookbook sre.switchdc.databases.prepare (exit_code=97) for the switch from test-s1 to test-s1
  • 10:07 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s1 to test-s1
  • 10:05 arnaudb@cumin1002: END (ERROR) - Cookbook sre.switchdc.databases.prepare (exit_code=97) for the switch from test-s1 to test-s1
  • 10:05 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s1 to test-s1
  • 10:01 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
  • 09:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 09:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 09:57 arnaudb@cumin1002: END (ERROR) - Cookbook sre.switchdc.databases.prepare (exit_code=97) for the (test) switch
  • 09:57 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
  • 09:57 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the (test) switch
  • 09:54 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
  • 09:49 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
  • 09:49 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the (test) switch
  • 09:48 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
  • 09:40 godog: start prometheus1005 bookworm upgrade - T326657
  • 09:36 claime: homer 'cr*codfw*' commit 'T372878'
  • 09:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2043.codfw.wmnet
  • 09:35 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2043.codfw.wmnet
  • 09:35 claime: pooling wikikube-worker2043.codfw.wmnet - T372878
  • 09:34 claime: homer 'lsw1-a3-codfw*' commit T372878
  • 09:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the (test) switch
  • 09:02 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
  • 08:52 jayme: running homer commit on on cr*codfw* - T372878
  • 08:50 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2047.codfw.wmnet
  • 08:50 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2047.codfw.wmnet
  • 08:48 jayme: running homer commit on on lsw1-a6-codfw* - T372878
  • 08:46 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the (test) switch
  • 08:45 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
  • 08:45 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2047.codfw.wmnet with OS bullseye
  • 08:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T371742)', diff saved to https://phabricator.wikimedia.org/P68023 and previous config saved to /var/cache/conftool/dbconfig/20240828-084045-ladsgroup.json
  • 08:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 08:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 08:37 hashar@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.20 refs T366965
  • 08:26 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2047.codfw.wmnet with reason: host reimage
  • 08:22 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2047.codfw.wmnet with reason: host reimage
  • 08:21 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the (test) switch
  • 08:21 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
  • 08:21 arnaudb@cumin1002: END (ERROR) - Cookbook sre.switchdc.databases.prepare (exit_code=97) for the (test) switch
  • 08:05 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f4a5bda6340>
  • 08:05 jayme@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2047
  • 08:04 jayme@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2047
  • 08:04 jayme@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2047.codfw.wmnet 196.0.192.10.in-addr.arpa 6.9.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 08:04 jayme@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2047.codfw.wmnet 196.0.192.10.in-addr.arpa 6.9.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 08:04 jayme@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:04 jayme@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2047 - jayme@cumin1002"
  • 08:04 jayme@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2047 - jayme@cumin1002"
  • 07:59 jayme@cumin1002: START - Cookbook sre.dns.netbox
  • 07:58 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
  • 07:58 arnaudb@cumin1002: END (ERROR) - Cookbook sre.switchdc.databases.prepare (exit_code=97) for the (test) switch
  • 07:58 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
  • 07:54 jayme@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f4a5bda6340>
  • 07:54 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2047.codfw.wmnet with OS bullseye
  • 07:54 jayme@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2047.codfw.wmnet on all recursors
  • 07:54 jayme@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2047.codfw.wmnet on all recursors
  • 07:53 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2007 to wikikube-worker2047
  • 07:52 jayme@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2047
  • 07:52 jayme@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2047
  • 07:52 jayme@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:52 jayme@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2007 to wikikube-worker2047 - jayme@cumin1002"
  • 07:51 jayme@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2007 to wikikube-worker2047 - jayme@cumin1002"
  • 07:44 jayme@cumin1002: START - Cookbook sre.dns.netbox
  • 07:44 jayme@cumin1002: START - Cookbook sre.hosts.rename from kubernetes2007 to wikikube-worker2047
  • 07:31 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2007.codfw.wmnet
  • 07:30 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2007.codfw.wmnet
  • 06:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 06:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 06:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T371742)', diff saved to https://phabricator.wikimedia.org/P68022 and previous config saved to /var/cache/conftool/dbconfig/20240828-062759-ladsgroup.json
  • 06:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P68021 and previous config saved to /var/cache/conftool/dbconfig/20240828-061252-ladsgroup.json
  • 06:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 06:01 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 05:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 05:59 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 05:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P68020 and previous config saved to /var/cache/conftool/dbconfig/20240828-055744-ladsgroup.json
  • 05:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T371742)', diff saved to https://phabricator.wikimedia.org/P68019 and previous config saved to /var/cache/conftool/dbconfig/20240828-054237-ladsgroup.json
  • 03:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2137 (T371742)', diff saved to https://phabricator.wikimedia.org/P68018 and previous config saved to /var/cache/conftool/dbconfig/20240828-033211-ladsgroup.json
  • 03:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 03:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 03:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T371742)', diff saved to https://phabricator.wikimedia.org/P68017 and previous config saved to /var/cache/conftool/dbconfig/20240828-033149-ladsgroup.json
  • 03:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P68016 and previous config saved to /var/cache/conftool/dbconfig/20240828-031642-ladsgroup.json
  • 03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P68015 and previous config saved to /var/cache/conftool/dbconfig/20240828-030135-ladsgroup.json
  • 02:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T371742)', diff saved to https://phabricator.wikimedia.org/P68014 and previous config saved to /var/cache/conftool/dbconfig/20240828-024627-ladsgroup.json
  • 02:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2136 (T371742)', diff saved to https://phabricator.wikimedia.org/P68013 and previous config saved to /var/cache/conftool/dbconfig/20240828-020145-ladsgroup.json
  • 02:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 02:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T370903)', diff saved to https://phabricator.wikimedia.org/P68012 and previous config saved to /var/cache/conftool/dbconfig/20240828-013903-ladsgroup.json
  • 01:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P68011 and previous config saved to /var/cache/conftool/dbconfig/20240828-012356-ladsgroup.json
  • 01:21 ejegg: payments-wiki upgraded from f6a3be41 to 54988ad9
  • 01:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P68010 and previous config saved to /var/cache/conftool/dbconfig/20240828-010849-ladsgroup.json
  • 00:59 ejegg: payments-wiki upgraded from 0455b791 to f6a3be41
  • 00:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T370903)', diff saved to https://phabricator.wikimedia.org/P68009 and previous config saved to /var/cache/conftool/dbconfig/20240828-005342-ladsgroup.json
  • 00:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2218 (T370903)', diff saved to https://phabricator.wikimedia.org/P68008 and previous config saved to /var/cache/conftool/dbconfig/20240828-004702-ladsgroup.json
  • 00:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 00:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 00:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T370903)', diff saved to https://phabricator.wikimedia.org/P68007 and previous config saved to /var/cache/conftool/dbconfig/20240828-004639-ladsgroup.json
  • 00:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P68006 and previous config saved to /var/cache/conftool/dbconfig/20240828-003132-ladsgroup.json
  • 00:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P68005 and previous config saved to /var/cache/conftool/dbconfig/20240828-001625-ladsgroup.json
  • 00:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 00:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 00:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T371742)', diff saved to https://phabricator.wikimedia.org/P68004 and previous config saved to /var/cache/conftool/dbconfig/20240828-001214-ladsgroup.json
  • 00:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T370903)', diff saved to https://phabricator.wikimedia.org/P68003 and previous config saved to /var/cache/conftool/dbconfig/20240828-000117-ladsgroup.json

2024-08-27

  • 23:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P68002 and previous config saved to /var/cache/conftool/dbconfig/20240827-235707-ladsgroup.json
  • 23:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T370903)', diff saved to https://phabricator.wikimedia.org/P68001 and previous config saved to /var/cache/conftool/dbconfig/20240827-235426-ladsgroup.json
  • 23:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 23:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 23:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 23:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 23:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P68000 and previous config saved to /var/cache/conftool/dbconfig/20240827-234200-ladsgroup.json
  • 23:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 23:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 23:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T370903)', diff saved to https://phabricator.wikimedia.org/P67999 and previous config saved to /var/cache/conftool/dbconfig/20240827-233854-ladsgroup.json
  • 23:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T371742)', diff saved to https://phabricator.wikimedia.org/P67998 and previous config saved to /var/cache/conftool/dbconfig/20240827-232653-ladsgroup.json
  • 23:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P67997 and previous config saved to /var/cache/conftool/dbconfig/20240827-232346-ladsgroup.json
  • 23:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P67996 and previous config saved to /var/cache/conftool/dbconfig/20240827-230839-ladsgroup.json
  • 22:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T370903)', diff saved to https://phabricator.wikimedia.org/P67995 and previous config saved to /var/cache/conftool/dbconfig/20240827-225332-ladsgroup.json
  • 22:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T370903)', diff saved to https://phabricator.wikimedia.org/P67994 and previous config saved to /var/cache/conftool/dbconfig/20240827-224542-ladsgroup.json
  • 22:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 22:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 22:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T370903)', diff saved to https://phabricator.wikimedia.org/P67993 and previous config saved to /var/cache/conftool/dbconfig/20240827-224520-ladsgroup.json
  • 22:34 cstone: civicrm upgraded from f70d753c to e3aead7d
  • 22:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P67992 and previous config saved to /var/cache/conftool/dbconfig/20240827-223013-ladsgroup.json
  • 22:15 swfrench-wmf: running homer 'cr*codfw*' commit 'T372878'
  • 22:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P67991 and previous config saved to /var/cache/conftool/dbconfig/20240827-221506-ladsgroup.json
  • 22:07 swfrench-wmf: pooled / uncordoned wikikube-worker2046.codfw.wmnet - T372878
  • 22:06 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2046.codfw.wmnet
  • 22:06 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2046.codfw.wmnet
  • 22:04 swfrench-wmf: Running homer 'lsw1-a8-codfw*' commit 'T372878'
  • 22:01 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2046.codfw.wmnet with OS bullseye
  • 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T370903)', diff saved to https://phabricator.wikimedia.org/P67990 and previous config saved to /var/cache/conftool/dbconfig/20240827-215958-ladsgroup.json
  • 21:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T370903)', diff saved to https://phabricator.wikimedia.org/P67989 and previous config saved to /var/cache/conftool/dbconfig/20240827-215230-ladsgroup.json
  • 21:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 21:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 21:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T370903)', diff saved to https://phabricator.wikimedia.org/P67988 and previous config saved to /var/cache/conftool/dbconfig/20240827-215208-ladsgroup.json
  • 21:41 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2046.codfw.wmnet with reason: host reimage
  • 21:38 swfrench@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2046.codfw.wmnet with reason: host reimage
  • 21:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P67987 and previous config saved to /var/cache/conftool/dbconfig/20240827-213700-ladsgroup.json
  • 21:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P67986 and previous config saved to /var/cache/conftool/dbconfig/20240827-212153-ladsgroup.json
  • 21:20 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f46e8b0b1c0>
  • 21:20 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2046
  • 21:20 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2046
  • 21:20 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2046.codfw.wmnet 69.0.192.10.in-addr.arpa 9.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 21:20 swfrench@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-worker2046.codfw.wmnet 69.0.192.10.in-addr.arpa 9.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 21:20 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:20 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2046 - swfrench@cumin2002"
  • 21:20 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2046 - swfrench@cumin2002"
  • 21:15 swfrench@cumin2002: START - Cookbook sre.dns.netbox
  • 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T371742)', diff saved to https://phabricator.wikimedia.org/P67985 and previous config saved to /var/cache/conftool/dbconfig/20240827-211538-ladsgroup.json
  • 21:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 21:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T371742)', diff saved to https://phabricator.wikimedia.org/P67984 and previous config saved to /var/cache/conftool/dbconfig/20240827-211516-ladsgroup.json
  • 21:15 swfrench@cumin2002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f46e8b0b1c0>
  • 21:14 swfrench@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2046.codfw.wmnet with OS bullseye
  • 21:13 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2046.codfw.wmnet on all recursors
  • 21:13 swfrench@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-worker2046.codfw.wmnet on all recursors
  • 21:12 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2026 to wikikube-worker2046
  • 21:12 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2046
  • 21:11 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2046
  • 21:11 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:11 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2026 to wikikube-worker2046 - swfrench@cumin2002"
  • 21:11 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2026 to wikikube-worker2046 - swfrench@cumin2002"
  • 21:07 swfrench@cumin2002: START - Cookbook sre.dns.netbox
  • 21:07 swfrench@cumin2002: START - Cookbook sre.hosts.rename from kubernetes2026 to wikikube-worker2046
  • 21:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T370903)', diff saved to https://phabricator.wikimedia.org/P67983 and previous config saved to /var/cache/conftool/dbconfig/20240827-210646-ladsgroup.json
  • 21:06 zabe@deploy1003: Finished scap sync-world: Backport for Activates the "compact" Parsoid indicator on all wikivoyage wikis (T372789), Rollback Parsoid+Kartographer rollout on hewiki and commons (T373454 T373460) (duration: 10m 55s)
  • 21:01 zabe@deploy1003: ihurbain, zabe, cscott: Continuing with sync
  • 21:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P67982 and previous config saved to /var/cache/conftool/dbconfig/20240827-210008-ladsgroup.json
  • 20:59 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2026.codfw.wmnet
  • 20:58 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2026.codfw.wmnet
  • 20:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T370903)', diff saved to https://phabricator.wikimedia.org/P67981 and previous config saved to /var/cache/conftool/dbconfig/20240827-205855-ladsgroup.json
  • 20:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 20:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 20:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 20:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 20:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T370903)', diff saved to https://phabricator.wikimedia.org/P67980 and previous config saved to /var/cache/conftool/dbconfig/20240827-205817-ladsgroup.json
  • 20:57 zabe@deploy1003: ihurbain, zabe, cscott: Backport for Activates the "compact" Parsoid indicator on all wikivoyage wikis (T372789), Rollback Parsoid+Kartographer rollout on hewiki and commons (T373454 T373460) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:55 zabe@deploy1003: Started scap sync-world: Backport for Activates the "compact" Parsoid indicator on all wikivoyage wikis (T372789), Rollback Parsoid+Kartographer rollout on hewiki and commons (T373454 T373460)
  • 20:53 zabe@deploy1003: Finished scap sync-world: Backport for Remove warning on non-existing category (T373454), Remove warning on non-existing category (T373454) (duration: 08m 11s)
  • 20:53 mstyles@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 20:53 mstyles@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 20:52 mstyles@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 20:52 mstyles@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 20:52 mstyles@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 20:52 mstyles@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 20:52 mstyles@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 20:51 mstyles@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 20:51 mstyles@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 20:49 mstyles@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 20:49 zabe@deploy1003: cscott, zabe: Continuing with sync
  • 20:48 zabe@deploy1003: cscott, zabe: Backport for Remove warning on non-existing category (T373454), Remove warning on non-existing category (T373454) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:45 zabe@deploy1003: Started scap sync-world: Backport for Remove warning on non-existing category (T373454), Remove warning on non-existing category (T373454)
  • 20:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P67979 and previous config saved to /var/cache/conftool/dbconfig/20240827-204501-ladsgroup.json
  • 20:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P67978 and previous config saved to /var/cache/conftool/dbconfig/20240827-204310-ladsgroup.json
  • 20:38 zabe@deploy1003: Finished scap sync-world: Backport for Revert "Allow gadget/browser extension extensibility of empty search state" (T373463), Tweak styling of compact Parsoid indicator (T372789) (duration: 13m 23s)
  • 20:33 zabe@deploy1003: cscott, zabe, jdlrobson: Continuing with sync
  • 20:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T371742)', diff saved to https://phabricator.wikimedia.org/P67977 and previous config saved to /var/cache/conftool/dbconfig/20240827-202954-ladsgroup.json
  • 20:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P67976 and previous config saved to /var/cache/conftool/dbconfig/20240827-202803-ladsgroup.json
  • 20:27 zabe@deploy1003: cscott, zabe, jdlrobson: Backport for Revert "Allow gadget/browser extension extensibility of empty search state" (T373463), Tweak styling of compact Parsoid indicator (T372789) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:27 bking@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2043.codfw.wmnet
  • 20:27 bking@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2043.codfw.wmnet
  • 20:24 zabe@deploy1003: Started scap sync-world: Backport for Revert "Allow gadget/browser extension extensibility of empty search state" (T373463), Tweak styling of compact Parsoid indicator (T372789)
  • 20:22 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:22 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:22 zabe@deploy1003: Finished scap sync-world: Backport for Disable mobile Watchlist on wikidata since its broken (T263633) (duration: 09m 39s)
  • 20:17 zabe@deploy1003: jdlrobson, zabe: Continuing with sync
  • 20:15 zabe@deploy1003: jdlrobson, zabe: Backport for Disable mobile Watchlist on wikidata since its broken (T263633) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T370903)', diff saved to https://phabricator.wikimedia.org/P67975 and previous config saved to /var/cache/conftool/dbconfig/20240827-201256-ladsgroup.json
  • 20:12 zabe@deploy1003: Started scap sync-world: Backport for Disable mobile Watchlist on wikidata since its broken (T263633)
  • 20:12 zabe@deploy1003: Finished scap sync-world: Backport for Turn account vanishing contact form into a redirect. (T372828), Revert "[svwikt] Add a temporary logo for the 100.000 pages" (T364247) (duration: 11m 28s)
  • 20:05 zabe@deploy1003: dbrant, zabe, pppery: Continuing with sync
  • 20:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T370903)', diff saved to https://phabricator.wikimedia.org/P67974 and previous config saved to /var/cache/conftool/dbconfig/20240827-200459-ladsgroup.json
  • 20:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 20:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 20:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T370903)', diff saved to https://phabricator.wikimedia.org/P67973 and previous config saved to /var/cache/conftool/dbconfig/20240827-200437-ladsgroup.json
  • 20:04 zabe@deploy1003: dbrant, zabe, pppery: Backport for Turn account vanishing contact form into a redirect. (T372828), Revert "[svwikt] Add a temporary logo for the 100.000 pages" (T364247) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:01 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:01 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:00 zabe@deploy1003: Started scap sync-world: Backport for Turn account vanishing contact form into a redirect. (T372828), Revert "[svwikt] Add a temporary logo for the 100.000 pages" (T364247)
  • 19:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P67972 and previous config saved to /var/cache/conftool/dbconfig/20240827-194930-ladsgroup.json
  • 19:44 zabe@deploy1003: Finished scap sync-world: Backport for Update uzwiki logo (T370165) (duration: 17m 07s)
  • 19:38 zabe@deploy1003: zabe: Continuing with sync
  • 19:37 zabe@deploy1003: zabe: Backport for Update uzwiki logo (T370165) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P67971 and previous config saved to /var/cache/conftool/dbconfig/20240827-193424-ladsgroup.json
  • 19:27 zabe@deploy1003: Started scap sync-world: Backport for Update uzwiki logo (T370165)
  • 19:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T370903)', diff saved to https://phabricator.wikimedia.org/P67970 and previous config saved to /var/cache/conftool/dbconfig/20240827-191915-ladsgroup.json
  • 19:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T370903)', diff saved to https://phabricator.wikimedia.org/P67969 and previous config saved to /var/cache/conftool/dbconfig/20240827-191116-ladsgroup.json
  • 19:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 19:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 19:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T370903)', diff saved to https://phabricator.wikimedia.org/P67968 and previous config saved to /var/cache/conftool/dbconfig/20240827-191053-ladsgroup.json
  • 19:01 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-lab1001
  • 19:01 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-lab1001
  • 19:01 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:01 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt ml-lab servers - jclark@cumin1002"
  • 19:01 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt ml-lab servers - jclark@cumin1002"
  • 18:58 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 18:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P67967 and previous config saved to /var/cache/conftool/dbconfig/20240827-185546-ladsgroup.json
  • 18:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P67966 and previous config saved to /var/cache/conftool/dbconfig/20240827-184039-ladsgroup.json
  • 18:38 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:38 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:33 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:33 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T370903)', diff saved to https://phabricator.wikimedia.org/P67965 and previous config saved to /var/cache/conftool/dbconfig/20240827-182531-ladsgroup.json
  • 18:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2121 (T370903)', diff saved to https://phabricator.wikimedia.org/P67964 and previous config saved to /var/cache/conftool/dbconfig/20240827-181732-ladsgroup.json
  • 18:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 18:17 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 18:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 18:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 18:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T370903)', diff saved to https://phabricator.wikimedia.org/P67963 and previous config saved to /var/cache/conftool/dbconfig/20240827-181653-ladsgroup.json
  • 18:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T371742)', diff saved to https://phabricator.wikimedia.org/P67962 and previous config saved to /var/cache/conftool/dbconfig/20240827-181020-ladsgroup.json
  • 18:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 18:10 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-lab1001
  • 18:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 18:10 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-lab1001
  • 18:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T371742)', diff saved to https://phabricator.wikimedia.org/P67961 and previous config saved to /var/cache/conftool/dbconfig/20240827-180958-ladsgroup.json
  • 18:06 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1009
  • 18:05 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1009
  • 18:05 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-lab1002
  • 18:05 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-lab1002
  • 18:05 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ml-lab1001
  • 18:05 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-lab1001
  • 18:05 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve1011
  • 18:04 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve1011
  • 18:04 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve1010
  • 18:04 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve1010
  • 18:03 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve1009
  • 18:02 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve1009
  • 18:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P67960 and previous config saved to /var/cache/conftool/dbconfig/20240827-180146-ladsgroup.json
  • 17:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P67959 and previous config saved to /var/cache/conftool/dbconfig/20240827-175451-ladsgroup.json
  • 17:54 ryankemper: T364368 Our LVS operation is done; I've enabled/ran puppet on the remaining lvs hosts
  • 17:50 ryankemper: T364368 Ran puppet on `A:lvs-low-traffic-codfw`, restarted `pybal.service`, and cleared away old ipvs entries for `10.2.1.33:80` and `10.2.1.36:80`
  • 17:47 ryankemper: T364368 Ran puppet on `A:lvs-secondary-codfw`, restarted `pybal.service`, and cleared away old ipvs entries for `10.2.1.33:80` and `10.2.1.36:80`
  • 17:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P67957 and previous config saved to /var/cache/conftool/dbconfig/20240827-174639-ladsgroup.json
  • 17:43 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:43 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt ml-lab servers - jclark@cumin1002"
  • 17:43 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt ml-lab servers - jclark@cumin1002"
  • 17:42 ryankemper: Typo, meant to say forced recheck on `lvs1019` to clear alert
  • 17:41 ryankemper: Forced recheck on lvs2019 to clear alert
  • 17:40 ryankemper: T364368 Cleared away old ipvs entries for `10.2.2.33:80` and `10.2.2.36:80`
  • 17:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P67956 and previous config saved to /var/cache/conftool/dbconfig/20240827-173944-ladsgroup.json
  • 17:38 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 17:37 ryankemper: T364368 Ran puppet on `A:lvs-low-traffic-eqiad` and restarted `pybal.service`
  • 17:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T370903)', diff saved to https://phabricator.wikimedia.org/P67954 and previous config saved to /var/cache/conftool/dbconfig/20240827-173132-ladsgroup.json
  • 17:30 sukhe: force recheck on Icinga for lvs1020
  • 17:30 sukhe: sukhe@lvs1020:~$ sudo ipvsadm --delete-service --tcp-service 10.2.2.33:80
  • 17:29 sukhe: sukhe@lvs1020:~$ sudo ipvsadm ---delete-service --tcp-service 10.2.2.36:80
  • 17:24 ryankemper: T364368 `ryankemper@cumin2002:~$ sudo cumin 'A:lvs-secondary-eqiad' 'systemctl status pybal.service'`
  • 17:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T371742)', diff saved to https://phabricator.wikimedia.org/P67953 and previous config saved to /var/cache/conftool/dbconfig/20240827-172436-ladsgroup.json
  • 17:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T370903)', diff saved to https://phabricator.wikimedia.org/P67952 and previous config saved to /var/cache/conftool/dbconfig/20240827-172401-ladsgroup.json
  • 17:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 17:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 17:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T370903)', diff saved to https://phabricator.wikimedia.org/P67951 and previous config saved to /var/cache/conftool/dbconfig/20240827-172339-ladsgroup.json
  • 17:13 ryankemper: T364368 Ran puppet on `A:lvs-secondary-eqiad` and restarted pybal.service
  • 17:08 akosiaris@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2045.codfw.wmnet with OS bullseye
  • 17:08 ryankemper: T364368 Disabled puppet on all lvs hosts in preparation for rolling restart
  • 17:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P67950 and previous config saved to /var/cache/conftool/dbconfig/20240827-170832-ladsgroup.json
  • 16:56 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus5002.eqsin.wmnet
  • 16:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P67949 and previous config saved to /var/cache/conftool/dbconfig/20240827-165325-ladsgroup.json
  • 16:50 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus5002.eqsin.wmnet
  • 16:45 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:45 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt ml-lab servers - jclark@cumin1002"
  • 16:45 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt ml-lab servers - jclark@cumin1002"
  • 16:42 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 16:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T370903)', diff saved to https://phabricator.wikimedia.org/P67948 and previous config saved to /var/cache/conftool/dbconfig/20240827-163817-ladsgroup.json
  • 16:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T367856)', diff saved to https://phabricator.wikimedia.org/P67947 and previous config saved to /var/cache/conftool/dbconfig/20240827-163407-marostegui.json
  • 16:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 7:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 16:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 7:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 16:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T367856)', diff saved to https://phabricator.wikimedia.org/P67946 and previous config saved to /var/cache/conftool/dbconfig/20240827-163345-marostegui.json
  • 16:25 denisse: Start prometheus5002 Bookworm upgrade - T326657
  • 16:21 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus4002.ulsfo.wmnet
  • 16:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P67945 and previous config saved to /var/cache/conftool/dbconfig/20240827-161837-marostegui.json
  • 16:17 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus4002.ulsfo.wmnet
  • 16:13 kamila@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2044.codfw.wmnet
  • 16:13 kamila@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2044.codfw.wmnet
  • 16:12 kamila_: ran homer to add wikikube-worker2044 T372878
  • 16:05 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2044.codfw.wmnet with OS bullseye
  • 16:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T370903)', diff saved to https://phabricator.wikimedia.org/P67944 and previous config saved to /var/cache/conftool/dbconfig/20240827-160403-ladsgroup.json
  • 16:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 16:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 16:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T370903)', diff saved to https://phabricator.wikimedia.org/P67943 and previous config saved to /var/cache/conftool/dbconfig/20240827-160341-ladsgroup.json
  • 16:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P67942 and previous config saved to /var/cache/conftool/dbconfig/20240827-160330-marostegui.json
  • 16:03 tappof@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::data and logs*2037.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
  • 15:59 tappof@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on P{O:logging::opensearch::data and logs*2037.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
  • 15:58 tappof@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::data and logs*2036.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
  • 15:57 tappof@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on P{O:logging::opensearch::data and logs*2036.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
  • 15:57 tappof@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::data and logs*2035.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
  • 15:54 denisse: Start prometheus4002 Bookworm upgrade - T326657
  • 15:52 tappof@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on P{O:logging::opensearch::data and logs*2035.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
  • 15:52 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f7528213c70>
  • 15:52 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2045
  • 15:51 tappof@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::data and logs*2034.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
  • 15:50 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2045
  • 15:50 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2045.codfw.wmnet 163.0.192.10.in-addr.arpa 3.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:50 akosiaris@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2045.codfw.wmnet 163.0.192.10.in-addr.arpa 3.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:50 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:50 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2045 - akosiaris@cumin1002"
  • 15:50 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2045 - akosiaris@cumin1002"
  • 15:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P67941 and previous config saved to /var/cache/conftool/dbconfig/20240827-154834-ladsgroup.json
  • 15:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T367856)', diff saved to https://phabricator.wikimedia.org/P67940 and previous config saved to /var/cache/conftool/dbconfig/20240827-154823-marostegui.json
  • 15:46 tappof@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on P{O:logging::opensearch::data and logs*2034.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
  • 15:45 tappof@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::data and logs*2033.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
  • 15:45 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2044.codfw.wmnet with reason: host reimage
  • 15:44 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 15:43 akosiaris@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f7528213c70>
  • 15:43 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2045.codfw.wmnet with OS bullseye
  • 15:43 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2293 to wikikube-worker2045
  • 15:42 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2044.codfw.wmnet with reason: host reimage
  • 15:42 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2045
  • 15:42 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2045
  • 15:42 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:42 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2293 to wikikube-worker2045 - akosiaris@cumin1002"
  • 15:39 tappof@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on P{O:logging::opensearch::data and logs*2033.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
  • 15:39 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2293 to wikikube-worker2045 - akosiaris@cumin1002"
  • 15:39 tappof@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::data and logs*2029.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
  • 15:36 tappof@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on P{O:logging::opensearch::data and logs*2029.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
  • 15:35 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 15:35 akosiaris@cumin1002: START - Cookbook sre.hosts.rename from mw2293 to wikikube-worker2045
  • 15:35 tappof@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::data and logs*2028.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
  • 15:33 tappof@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on P{O:logging::opensearch::data and logs*2028.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
  • 15:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P67939 and previous config saved to /var/cache/conftool/dbconfig/20240827-153327-ladsgroup.json
  • 15:31 tappof@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::data and logs*2027.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
  • 15:29 tappof@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on P{O:logging::opensearch::data and logs*2027.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
  • 15:27 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f958d5462b0>
  • 15:27 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2044
  • 15:26 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2044
  • 15:26 kamila@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2044.codfw.wmnet 207.0.192.10.in-addr.arpa 7.0.2.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:26 kamila@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2044.codfw.wmnet 207.0.192.10.in-addr.arpa 7.0.2.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:26 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:26 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2044 - kamila@cumin1002"
  • 15:26 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2044 - kamila@cumin1002"
  • 15:25 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:23 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 15:22 kamila@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f958d5462b0>
  • 15:22 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2044.codfw.wmnet with OS bullseye
  • 15:20 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 100%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67937 and previous config saved to /var/cache/conftool/dbconfig/20240827-152031-arnaudb.json
  • 15:20 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2019 to wikikube-worker2044
  • 15:19 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2044
  • 15:19 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:19 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2044
  • 15:19 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:19 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2019 to wikikube-worker2044 - kamila@cumin1002"
  • 15:19 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2019 to wikikube-worker2044 - kamila@cumin1002"
  • 15:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T370903)', diff saved to https://phabricator.wikimedia.org/P67936 and previous config saved to /var/cache/conftool/dbconfig/20240827-151819-ladsgroup.json
  • 15:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T370903)', diff saved to https://phabricator.wikimedia.org/P67935 and previous config saved to /var/cache/conftool/dbconfig/20240827-151610-ladsgroup.json
  • 15:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 15:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 15:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T370903)', diff saved to https://phabricator.wikimedia.org/P67934 and previous config saved to /var/cache/conftool/dbconfig/20240827-151548-ladsgroup.json
  • 15:15 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 15:15 kamila@cumin1002: START - Cookbook sre.hosts.rename from kubernetes2019 to wikikube-worker2044
  • 15:11 elukey: restart httpd and librenms-syslog.service on netmon1003 for libaom upgrades
  • 15:11 elukey: restart httpd on crm2001 for libaom upgrades
  • 15:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T371742)', diff saved to https://phabricator.wikimedia.org/P67933 and previous config saved to /var/cache/conftool/dbconfig/20240827-150952-ladsgroup.json
  • 15:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 15:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 15:05 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 75%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67932 and previous config saved to /var/cache/conftool/dbconfig/20240827-150525-arnaudb.json
  • 15:02 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2003.codfw.wmnet
  • 15:01 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P67931 and previous config saved to /var/cache/conftool/dbconfig/20240827-150041-ladsgroup.json
  • 14:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2232.codfw.wmnet with OS bookworm
  • 14:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2231.codfw.wmnet with OS bookworm
  • 14:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2230.codfw.wmnet with OS bookworm
  • 14:50 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 50%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67930 and previous config saved to /var/cache/conftool/dbconfig/20240827-145020-arnaudb.json
  • 14:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P67929 and previous config saved to /var/cache/conftool/dbconfig/20240827-144534-ladsgroup.json
  • 14:44 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:42 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2293.codfw.wmnet
  • 14:41 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2293.codfw.wmnet
  • 14:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on wikikube-ctrl2003.codfw.wmnet with reason: running provision again
  • 14:41 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on wikikube-ctrl2003.codfw.wmnet with reason: running provision again
  • 14:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2232.codfw.wmnet with reason: host reimage
  • 14:40 elukey@puppetserver1001: conftool action : set/pooled=no; selector: name=wikikube-ctrl2003.codfw.wmnet
  • 14:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2231.codfw.wmnet with reason: host reimage
  • 14:35 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 25%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67928 and previous config saved to /var/cache/conftool/dbconfig/20240827-143514-arnaudb.json
  • 14:35 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2230.codfw.wmnet with reason: host reimage
  • 14:32 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2231.codfw.wmnet with reason: host reimage
  • 14:32 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2232.codfw.wmnet with reason: host reimage
  • 14:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2230.codfw.wmnet with reason: host reimage
  • 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T370903)', diff saved to https://phabricator.wikimedia.org/P67927 and previous config saved to /var/cache/conftool/dbconfig/20240827-143027-ladsgroup.json
  • 14:29 brouberol@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:29 brouberol@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding AAAA field to wdqs101[1-3] and wdqs200[7-8] - brouberol@cumin1002"
  • 14:29 brouberol@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding AAAA field to wdqs101[1-3] and wdqs200[7-8] - brouberol@cumin1002"
  • 14:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on db2186.codfw.wmnet with reason: Schema change
  • 14:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on db2186.codfw.wmnet with reason: Schema change
  • 14:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2186.codfw.wmnet with reason: Schema change
  • 14:26 brouberol@cumin1002: START - Cookbook sre.dns.netbox
  • 14:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2186.codfw.wmnet with reason: Schema change
  • 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T370903)', diff saved to https://phabricator.wikimedia.org/P67926 and previous config saved to /var/cache/conftool/dbconfig/20240827-142516-ladsgroup.json
  • 14:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 14:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 14:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T370903)', diff saved to https://phabricator.wikimedia.org/P67925 and previous config saved to /var/cache/conftool/dbconfig/20240827-142454-ladsgroup.json
  • 14:24 marostegui: Update zarcillo db for pc4 master T373340
  • 14:20 akosiaris: T372878 uncordon wikikube-worker2043
  • 14:20 akosiaris: T327878 uncordon wikikube-worker2043
  • 14:20 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 15%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67924 and previous config saved to /var/cache/conftool/dbconfig/20240827-142009-arnaudb.json
  • 14:18 tappof@cumin2002: END (FAIL) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=99) rolling restart_daemons on P{O:logging::opensearch::data and logs*.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
  • 14:18 marostegui@cumin1002: dbctl commit (dc=all): 'Switch pc4 master to pc2015 T373340', diff saved to https://phabricator.wikimedia.org/P67923 and previous config saved to /var/cache/conftool/dbconfig/20240827-141845-marostegui.json
  • 14:18 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2232.codfw.wmnet with OS bookworm
  • 14:18 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2231.codfw.wmnet with OS bookworm
  • 14:17 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2230.codfw.wmnet with OS bookworm
  • 14:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2015-2016].codfw.wmnet,pc[1015-1016].eqiad.wmnet with reason: Switchover
  • 14:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2015-2016].codfw.wmnet,pc[1015-1016].eqiad.wmnet with reason: Switchover
  • 13:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P67920 and previous config saved to /var/cache/conftool/dbconfig/20240827-135440-ladsgroup.json
  • 13:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 3%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67919 and previous config saved to /var/cache/conftool/dbconfig/20240827-134958-arnaudb.json
  • 13:48 XioNoX: add bgpalerter to bookworm-wikipedia apt repo - T372909
  • 13:47 XioNoX: add routinator to bookworm-wikipedia apt repo - T372909
  • 13:46 zabe: zabe@mwmaint1002:~$ foreachwikiindblist private wrapOldPasswords.php --type BEP --update # T91917
  • 13:46 zabe: zabe@mwmaint1002:~$ foreachwikiindblist fishbowl wrapOldPasswords.php --type BEP --update # T91917
  • 13:45 zabe: zabe@mwmaint1002:~$ foreachwikiindblist private sql.php --query "UPDATE user SET user_password = CONCAT(':B:', user_id, ':', user_password) WHERE user_password RLIKE '^[0-9a-f]{32}$';" # T91917
  • 13:44 zabe: zabe@mwmaint1002:~$ foreachwikiindblist fishbowl sql.php --query "UPDATE user SET user_password = CONCAT(':B:', user_id, ':', user_password) WHERE user_password RLIKE '^[0-9a-f]{32}$';" # T91917
  • 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T370903)', diff saved to https://phabricator.wikimedia.org/P67918 and previous config saved to /var/cache/conftool/dbconfig/20240827-133933-ladsgroup.json
  • 13:37 zabe@deploy1003: Finished scap sync-world: Backport for Register feature flag for moving wikibase item to Other Projects sidebar in pilot wikis., Enable CampaignEvents Invitation Lists in production testing environments (T373041) (duration: 31m 27s)
  • 13:37 tappof@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::collector and log*.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
  • 13:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T370903)', diff saved to https://phabricator.wikimedia.org/P67917 and previous config saved to /var/cache/conftool/dbconfig/20240827-133723-ladsgroup.json
  • 13:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 13:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 13:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T370903)', diff saved to https://phabricator.wikimedia.org/P67915 and previous config saved to /var/cache/conftool/dbconfig/20240827-133701-ladsgroup.json
  • 13:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 2%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67914 and previous config saved to /var/cache/conftool/dbconfig/20240827-133452-arnaudb.json
  • 13:33 zabe@deploy1003: joelyrookewmde, daimona, zabe: Continuing with sync
  • 13:29 Daimona: Creating new DB tables for the CampaignEvents extension in x1.testwiki, x1.test2wiki, x1.officewiki, and x1.wikishared # T369303
  • 13:23 tappof@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on P{O:logging::opensearch::collector and log*.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
  • 13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P67913 and previous config saved to /var/cache/conftool/dbconfig/20240827-132154-ladsgroup.json
  • 13:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 1%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67912 and previous config saved to /var/cache/conftool/dbconfig/20240827-131947-arnaudb.json
  • 13:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 13:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T371742)', diff saved to https://phabricator.wikimedia.org/P67911 and previous config saved to /var/cache/conftool/dbconfig/20240827-131031-ladsgroup.json
  • 13:08 zabe@deploy1003: joelyrookewmde, daimona, zabe: Backport for Register feature flag for moving wikibase item to Other Projects sidebar in pilot wikis., Enable CampaignEvents Invitation Lists in production testing environments (T373041) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P67910 and previous config saved to /var/cache/conftool/dbconfig/20240827-130647-ladsgroup.json
  • 13:06 zabe@deploy1003: Started scap sync-world: Backport for Register feature flag for moving wikibase item to Other Projects sidebar in pilot wikis., Enable CampaignEvents Invitation Lists in production testing environments (T373041)
  • 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P67909 and previous config saved to /var/cache/conftool/dbconfig/20240827-125523-ladsgroup.json
  • 12:55 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2043.codfw.wmnet with OS bullseye
  • 12:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T370903)', diff saved to https://phabricator.wikimedia.org/P67908 and previous config saved to /var/cache/conftool/dbconfig/20240827-125139-ladsgroup.json
  • 12:49 zabe: zabe@mwmaint1002:~$ foreachwikiindblist fishbowl wrapOldPasswords.php --type BEP --update # T91917
  • 12:46 zabe: zabe@mwmaint1002:~$ foreachwikiindblist private wrapOldPasswords.php --type BEP --update # T91917
  • 12:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T370903)', diff saved to https://phabricator.wikimedia.org/P67907 and previous config saved to /var/cache/conftool/dbconfig/20240827-124629-ladsgroup.json
  • 12:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 12:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 12:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P67906 and previous config saved to /var/cache/conftool/dbconfig/20240827-124016-ladsgroup.json
  • 12:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T370903)', diff saved to https://phabricator.wikimedia.org/P67905 and previous config saved to /var/cache/conftool/dbconfig/20240827-123839-ladsgroup.json
  • 12:38 zabe@deploy1003: Finished scap sync-world: Backport for Revert apparent fix (T368712) (duration: 08m 20s)
  • 12:35 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2043.codfw.wmnet with reason: host reimage
  • 12:34 zabe@deploy1003: zabe: Continuing with sync
  • 12:33 zabe@deploy1003: zabe: Backport for Revert apparent fix (T368712) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:32 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2043.codfw.wmnet with reason: host reimage
  • 12:30 zabe@deploy1003: Started scap sync-world: Backport for Revert apparent fix (T368712)
  • 12:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67904 and previous config saved to /var/cache/conftool/dbconfig/20240827-122910-arnaudb.json
  • 12:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T371742)', diff saved to https://phabricator.wikimedia.org/P67903 and previous config saved to /var/cache/conftool/dbconfig/20240827-122509-ladsgroup.json
  • 12:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P67902 and previous config saved to /var/cache/conftool/dbconfig/20240827-122332-ladsgroup.json
  • 12:18 hashar@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.20 refs T366965
  • 12:15 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7fa3b11fc520>
  • 12:15 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2043
  • 12:15 kamila@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2019.codfw.wmnet
  • 12:14 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2043
  • 12:14 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2043.codfw.wmnet 162.0.192.10.in-addr.arpa 2.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:14 akosiaris@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2043.codfw.wmnet 162.0.192.10.in-addr.arpa 2.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:14 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:14 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2043 - akosiaris@cumin1002"
  • 12:14 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2043 - akosiaris@cumin1002"
  • 12:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 75%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67901 and previous config saved to /var/cache/conftool/dbconfig/20240827-121405-arnaudb.json
  • 12:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67900 and previous config saved to /var/cache/conftool/dbconfig/20240827-121216-arnaudb.json
  • 12:11 kamila@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2019.codfw.wmnet
  • 12:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2015.codfw.wmnet with reason: Network maintenance
  • 12:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc2015.codfw.wmnet with reason: Network maintenance
  • 12:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P67899 and previous config saved to /var/cache/conftool/dbconfig/20240827-120825-ladsgroup.json
  • 12:02 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 12:01 akosiaris@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7fa3b11fc520>
  • 12:01 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2043.codfw.wmnet with OS bullseye
  • 12:00 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2292 to wikikube-worker2043
  • 12:00 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2043
  • 11:59 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2043
  • 11:59 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:59 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2292 to wikikube-worker2043 - akosiaris@cumin1002"
  • 11:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 50%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67898 and previous config saved to /var/cache/conftool/dbconfig/20240827-115859-arnaudb.json
  • 11:58 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2292 to wikikube-worker2043 - akosiaris@cumin1002"
  • 11:57 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67897 and previous config saved to /var/cache/conftool/dbconfig/20240827-115711-arnaudb.json
  • 11:54 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 11:53 akosiaris@cumin1002: START - Cookbook sre.hosts.rename from mw2292 to wikikube-worker2043
  • 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T370903)', diff saved to https://phabricator.wikimedia.org/P67896 and previous config saved to /var/cache/conftool/dbconfig/20240827-115318-ladsgroup.json
  • 11:51 kart_: Updated cxserver to 2024-08-27-045705-production (T369815)
  • 11:50 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 11:49 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 11:46 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 11:46 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 11:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T370903)', diff saved to https://phabricator.wikimedia.org/P67895 and previous config saved to /var/cache/conftool/dbconfig/20240827-114608-ladsgroup.json
  • 11:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T370903)', diff saved to https://phabricator.wikimedia.org/P67894 and previous config saved to /var/cache/conftool/dbconfig/20240827-114546-ladsgroup.json
  • 11:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 25%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67893 and previous config saved to /var/cache/conftool/dbconfig/20240827-114354-arnaudb.json
  • 11:42 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67892 and previous config saved to /var/cache/conftool/dbconfig/20240827-114205-arnaudb.json
  • 11:39 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus7001.magru.wmnet
  • 11:38 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 11:38 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 11:33 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus7001.magru.wmnet
  • 11:32 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2292.codfw.wmnet
  • 11:31 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2292.codfw.wmnet
  • 11:30 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus6002.drmrs.wmnet
  • 11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P67891 and previous config saved to /var/cache/conftool/dbconfig/20240827-113039-ladsgroup.json
  • 11:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 15%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67890 and previous config saved to /var/cache/conftool/dbconfig/20240827-112848-arnaudb.json
  • 11:27 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67889 and previous config saved to /var/cache/conftool/dbconfig/20240827-112700-arnaudb.json
  • 11:24 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus6002.drmrs.wmnet
  • 11:20 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database cswikivoyage (T370912)
  • 11:20 hashar@deploy1003: Finished scap sync-world: testwikis to 1.43.0-wmf.20 refs T366965 (duration: 47m 15s)
  • 11:20 godog: start prometheus7001 bookworm upgrade - T326657
  • 11:19 claime: Deleting misbehaving pod ipoid-production-daily-updates-28742340-h5ckx - T373427
  • 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P67887 and previous config saved to /var/cache/conftool/dbconfig/20240827-111532-ladsgroup.json
  • 11:14 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-main2001.codfw.wmnet
  • 11:14 jayme@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:14 jayme@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-main2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jayme@cumin1002"
  • 11:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 5%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67886 and previous config saved to /var/cache/conftool/dbconfig/20240827-111343-arnaudb.json
  • 11:13 jayme@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-main2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jayme@cumin1002"
  • 11:12 godog: start prometheus6002 bookworm upgrade - T326657
  • 11:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 16%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67885 and previous config saved to /var/cache/conftool/dbconfig/20240827-111154-arnaudb.json
  • 11:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2030.codfw.wmnet
  • 11:05 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2030.codfw.wmnet
  • 11:00 Dreamy_Jazz: Starting MediaModeration time limited scan on group0 to make up monthly request limit - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 11:00 jayme@cumin1002: START - Cookbook sre.dns.netbox
  • 11:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T370903)', diff saved to https://phabricator.wikimedia.org/P67884 and previous config saved to /var/cache/conftool/dbconfig/20240827-110024-ladsgroup.json
  • 11:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2030.codfw.wmnet with OS bullseye
  • 10:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 3%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67883 and previous config saved to /var/cache/conftool/dbconfig/20240827-105837-arnaudb.json
  • 10:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T370903)', diff saved to https://phabricator.wikimedia.org/P67882 and previous config saved to /var/cache/conftool/dbconfig/20240827-105815-ladsgroup.json
  • 10:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 10:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 10:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 8%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67881 and previous config saved to /var/cache/conftool/dbconfig/20240827-105649-arnaudb.json
  • 10:55 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database cswikivoyage (T370912)
  • 10:54 jayme@cumin1002: START - Cookbook sre.hosts.decommission for hosts kafka-main2001.codfw.wmnet
  • 10:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2018.codfw.wmnet
  • 10:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2018.codfw.wmnet
  • 10:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2028.codfw.wmnet
  • 10:48 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2028.codfw.wmnet
  • 10:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2017.codfw.wmnet
  • 10:48 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2017.codfw.wmnet
  • 10:46 claime: Running homer 'lsw1-a6-codfw*' commit 'T372878'
  • 10:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 2%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67880 and previous config saved to /var/cache/conftool/dbconfig/20240827-104332-arnaudb.json
  • 10:43 claime: Running homer 'lsw1-a5-codfw*' commit 'T372878'
  • 10:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 6%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67879 and previous config saved to /var/cache/conftool/dbconfig/20240827-104143-arnaudb.json
  • 10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2018.codfw.wmnet with OS bullseye
  • 10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2030.codfw.wmnet with reason: host reimage
  • 10:36 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2030.codfw.wmnet with reason: host reimage
  • 10:33 hashar@deploy1003: Started scap sync-world: testwikis to 1.43.0-wmf.20 refs T366965
  • 10:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2028.codfw.wmnet with OS bullseye
  • 10:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 1%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67878 and previous config saved to /var/cache/conftool/dbconfig/20240827-102827-arnaudb.json
  • 10:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 4%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67877 and previous config saved to /var/cache/conftool/dbconfig/20240827-102638-arnaudb.json
  • 10:26 claime: homer 'cr*codfw*' commit 'T372878'
  • 10:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2017.codfw.wmnet with OS bullseye
  • 10:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2018.codfw.wmnet with reason: host reimage
  • 10:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7fa8baa9bd90>
  • 10:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2030
  • 10:19 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2030
  • 10:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2030.codfw.wmnet 177.0.192.10.in-addr.arpa 7.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 10:19 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2030.codfw.wmnet 177.0.192.10.in-addr.arpa 7.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 10:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2030 - cgoubert@cumin1002"
  • 10:19 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2030 - cgoubert@cumin1002"
  • 10:17 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2018.codfw.wmnet with reason: host reimage
  • 10:16 hashar@deploy1003: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki=aawiki --force-version "1.43.0-wmf.20" --no-progress --store-class=LCStoreCDB --threads=22 --lang en --quiet ' returned non-zero exit status 1. (duration: 00m 02s)
  • 10:16 hashar@deploy1003: Started scap sync-world: testwikis to 1.43.0-wmf.20 refs T366965
  • 10:14 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:14 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7fa8baa9bd90>
  • 10:14 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2030.codfw.wmnet with OS bullseye
  • 10:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2124.codfw.wmnet with reason: replag
  • 10:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2124.codfw.wmnet with reason: replag
  • 10:13 hashar@deploy1003: scap failed: PermissionError [Errno 13] Permission denied: '/srv/mediawiki-staging/php-1.43.0-wmf.20/cache/gitinfo' (duration: 00m 00s)
  • 10:13 hashar@deploy1003: Started scap sync-world: testwikis to 1.43.0-wmf.20 refs T366965
  • 10:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2030.codfw.wmnet
  • 10:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 2%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67876 and previous config saved to /var/cache/conftool/dbconfig/20240827-101132-arnaudb.json
  • 10:11 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2030.codfw.wmnet
  • 10:10 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:10 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2028.codfw.wmnet with reason: host reimage
  • 10:09 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:07 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:07 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2028.codfw.wmnet with reason: host reimage
  • 10:06 hashar@deploy1003: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki=aawiki --force-version "1.43.0-wmf.20" --no-progress --store-class=LCStoreCDB --threads=22 --lang en --quiet ' returned non-zero exit status 1. (duration: 00m 02s)
  • 10:06 hashar@deploy1003: Started scap sync-world: testwikis to 1.43.0-wmf.20 refs T366965
  • 10:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T371742)', diff saved to https://phabricator.wikimedia.org/P67875 and previous config saved to /var/cache/conftool/dbconfig/20240827-100548-ladsgroup.json
  • 10:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 10:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 10:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T371742)', diff saved to https://phabricator.wikimedia.org/P67874 and previous config saved to /var/cache/conftool/dbconfig/20240827-100527-ladsgroup.json
  • 10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2017.codfw.wmnet with reason: host reimage
  • 10:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f65f7b4bd90>
  • 10:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2018
  • 10:01 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2017.codfw.wmnet with reason: host reimage
  • 10:01 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2018
  • 10:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2018.codfw.wmnet 95.0.192.10.in-addr.arpa 5.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 10:00 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2018.codfw.wmnet 95.0.192.10.in-addr.arpa 5.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 10:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2018 - cgoubert@cumin1002"
  • 10:00 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2018 - cgoubert@cumin1002"
  • 09:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67873 and previous config saved to /var/cache/conftool/dbconfig/20240827-095627-arnaudb.json
  • 09:53 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 09:53 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f65f7b4bd90>
  • 09:52 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2018.codfw.wmnet with OS bullseye
  • 09:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2018.codfw.wmnet
  • 09:50 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2018.codfw.wmnet
  • 09:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P67872 and previous config saved to /var/cache/conftool/dbconfig/20240827-095019-ladsgroup.json
  • 09:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f6e24a10d30>
  • 09:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2028
  • 09:49 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2028
  • 09:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2028.codfw.wmnet 178.0.192.10.in-addr.arpa 8.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:49 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2028.codfw.wmnet 178.0.192.10.in-addr.arpa 8.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2028 - cgoubert@cumin1002"
  • 09:49 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2028 - cgoubert@cumin1002"
  • 09:45 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 09:45 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f6e24a10d30>
  • 09:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f057a31dd90>
  • 09:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2017
  • 09:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2028.codfw.wmnet with OS bullseye
  • 09:43 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2017
  • 09:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2017.codfw.wmnet 76.0.192.10.in-addr.arpa 6.7.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:43 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2017.codfw.wmnet 76.0.192.10.in-addr.arpa 6.7.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2017 - cgoubert@cumin1002"
  • 09:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2017 - cgoubert@cumin1002"
  • 09:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 09:40 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f057a31dd90>
  • 09:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2028.codfw.wmnet
  • 09:39 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2028.codfw.wmnet
  • 09:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2017.codfw.wmnet with OS bullseye
  • 09:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2017.codfw.wmnet
  • 09:36 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2017.codfw.wmnet
  • 09:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P67871 and previous config saved to /var/cache/conftool/dbconfig/20240827-093512-ladsgroup.json
  • 09:32 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db2124.codfw.wmnet with reason: db2124 fix
  • 09:32 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on db2124.codfw.wmnet with reason: db2124 fix
  • 09:25 hashar: train: fast forwarded mediawiki/core wmf/1.43.0-wmf.20 from 1faf18d6570 to ef87455d7c3 # T366965
  • 09:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T371742)', diff saved to https://phabricator.wikimedia.org/P67870 and previous config saved to /var/cache/conftool/dbconfig/20240827-092005-ladsgroup.json
  • 09:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2114.codfw.wmnet
  • 09:13 marostegui@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:13 marostegui@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2114.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
  • 09:11 marostegui@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2114.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
  • 09:08 marostegui@cumin1002: START - Cookbook sre.dns.netbox
  • 09:04 tappof@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::collector and logstash*.codfw.wmnet} and (A:logstash-collector)
  • 09:02 marostegui@cumin1002: START - Cookbook sre.hosts.decommission for hosts db2114.codfw.wmnet
  • 09:01 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@8d2d8fe] (releasing): (no justification provided) (duration: 00m 48s)
  • 09:01 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2232.codfw.wmnet with OS bookworm
  • 09:00 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@8d2d8fe] (releasing): (no justification provided)
  • 09:00 tappof@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors rolling restart_daemons on P{O:logging::opensearch::collector and logstash*.codfw.wmnet} and (A:logstash-collector)
  • 08:55 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db2124', diff saved to https://phabricator.wikimedia.org/P67868 and previous config saved to /var/cache/conftool/dbconfig/20240827-085551-arnaudb.json
  • 08:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db1161.eqiad.wmnet with reason: db1161 upgrade
  • 08:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on db1161.eqiad.wmnet with reason: db1161 upgrade
  • 08:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1161.eqiad.wmnet
  • 08:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on an-redacteddb1001.eqiad.wmnet with reason: upgrading db1161
  • 08:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on an-redacteddb1001.eqiad.wmnet with reason: upgrading db1161
  • 08:44 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1154.eqiad.wmnet with reason: upgrading db1161
  • 08:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1154.eqiad.wmnet with reason: upgrading db1161
  • 08:39 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db1161.eqiad.wmnet
  • 08:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1161.eqiad.wmnet with reason: db1161 upgrade
  • 08:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1161.eqiad.wmnet with reason: db1161 upgrade
  • 08:29 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db1161 - T373328', diff saved to https://phabricator.wikimedia.org/P67867 and previous config saved to /var/cache/conftool/dbconfig/20240827-082923-arnaudb.json
  • 08:18 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2232.codfw.wmnet with OS bookworm
  • 08:18 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2230.codfw.wmnet with OS bookworm
  • 08:18 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2231.codfw.wmnet with OS bookworm
  • 08:18 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2232.codfw.wmnet with OS bookworm
  • 08:01 urbanecm: Clear throttle for 105.113.127.170 via resetAuthenticationThrottle.php (T373414)
  • 08:00 urbanecm@deploy1003: Finished scap sync-world: Backport for Add throttle rule for Wikimedia Hausa edit-a-thon (T373414) (duration: 06m 42s)
  • 07:53 urbanecm@deploy1003: Started scap sync-world: Backport for Add throttle rule for Wikimedia Hausa edit-a-thon (T373414)
  • 07:50 godog: ack probedown for puppetmaster:8181 - T373369
  • 07:49 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 07:45 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 07:26 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2231.codfw.wmnet with OS bookworm
  • 07:26 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2230.codfw.wmnet with OS bookworm
  • 07:22 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2232.codfw.wmnet with OS bookworm
  • 07:12 kartik@deploy1003: Finished scap sync-world: Backport for Section Translation: Fix some language codes (duration: 08m 09s)
  • 07:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T371742)', diff saved to https://phabricator.wikimedia.org/P67866 and previous config saved to /var/cache/conftool/dbconfig/20240827-070845-ladsgroup.json
  • 07:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 07:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 07:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T371742)', diff saved to https://phabricator.wikimedia.org/P67865 and previous config saved to /var/cache/conftool/dbconfig/20240827-070823-ladsgroup.json
  • 07:07 kartik@deploy1003: kartik: Continuing with sync
  • 07:06 kartik@deploy1003: kartik: Backport for Section Translation: Fix some language codes synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:04 kartik@deploy1003: Started scap sync-world: Backport for Section Translation: Fix some language codes
  • 06:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P67864 and previous config saved to /var/cache/conftool/dbconfig/20240827-065316-ladsgroup.json
  • 06:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P67863 and previous config saved to /var/cache/conftool/dbconfig/20240827-063809-ladsgroup.json
  • 06:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T371742)', diff saved to https://phabricator.wikimedia.org/P67862 and previous config saved to /var/cache/conftool/dbconfig/20240827-062302-ladsgroup.json
  • 05:34 kcvelaga@deploy1003: Finished deploy [airflow-dags/analytics_product@0b23c91]: (no justification provided) (duration: 00m 18s)
  • 05:33 kcvelaga@deploy1003: Started deploy [airflow-dags/analytics_product@0b23c91]: (no justification provided)
  • 04:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T371742)', diff saved to https://phabricator.wikimedia.org/P67861 and previous config saved to /var/cache/conftool/dbconfig/20240827-041446-ladsgroup.json
  • 04:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 04:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 04:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T371742)', diff saved to https://phabricator.wikimedia.org/P67860 and previous config saved to /var/cache/conftool/dbconfig/20240827-041424-ladsgroup.json
  • 04:01 mwpresync@deploy1003: Pruned MediaWiki: 1.43.0-wmf.17 (duration: 01m 28s)
  • 03:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P67859 and previous config saved to /var/cache/conftool/dbconfig/20240827-035916-ladsgroup.json
  • 03:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P67858 and previous config saved to /var/cache/conftool/dbconfig/20240827-034409-ladsgroup.json
  • 03:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T371742)', diff saved to https://phabricator.wikimedia.org/P67857 and previous config saved to /var/cache/conftool/dbconfig/20240827-032902-ladsgroup.json
  • 02:49 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox
  • 02:23 brett: Import corto 0.3-1 into bookworm-wikimedia apt archive
  • 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T371742)', diff saved to https://phabricator.wikimedia.org/P67856 and previous config saved to /var/cache/conftool/dbconfig/20240827-011527-ladsgroup.json
  • 01:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 01:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T371742)', diff saved to https://phabricator.wikimedia.org/P67855 and previous config saved to /var/cache/conftool/dbconfig/20240827-011505-ladsgroup.json
  • 00:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P67854 and previous config saved to /var/cache/conftool/dbconfig/20240827-005958-ladsgroup.json
  • 00:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P67853 and previous config saved to /var/cache/conftool/dbconfig/20240827-004451-ladsgroup.json
  • 00:40 dduvall@deploy1003: Finished deploy [releng/jenkins-deploy@663c843] (releasing): (no justification provided) (duration: 00m 40s)
  • 00:39 dduvall@deploy1003: Started deploy [releng/jenkins-deploy@663c843] (releasing): (no justification provided)
  • 00:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T371742)', diff saved to https://phabricator.wikimedia.org/P67852 and previous config saved to /var/cache/conftool/dbconfig/20240827-002944-ladsgroup.json

2024-08-26

  • 22:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T370903)', diff saved to https://phabricator.wikimedia.org/P67851 and previous config saved to /var/cache/conftool/dbconfig/20240826-225933-ladsgroup.json
  • 22:51 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox
  • 22:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P67850 and previous config saved to /var/cache/conftool/dbconfig/20240826-224426-ladsgroup.json
  • 22:36 swfrench-wmf: running homer 'cr*codfw*' commit 'T372878' (remove old BGP session config for kubernetes2018, kubernetes2025)
  • 22:29 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs-main.discovery.wmnet wdqs-scholarly.discovery.wmnet on all recursors
  • 22:29 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs-main.discovery.wmnet wdqs-scholarly.discovery.wmnet on all recursors
  • 22:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P67849 and previous config saved to /var/cache/conftool/dbconfig/20240826-222919-ladsgroup.json
  • 22:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T371742)', diff saved to https://phabricator.wikimedia.org/P67848 and previous config saved to /var/cache/conftool/dbconfig/20240826-222351-ladsgroup.json
  • 22:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 22:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 22:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T371742)', diff saved to https://phabricator.wikimedia.org/P67847 and previous config saved to /var/cache/conftool/dbconfig/20240826-222328-ladsgroup.json
  • 22:17 zabe@deploy1003: Finished scap sync-world: Backport for Removing 'spamblacklistlog' right from usergroups (T367683) (duration: 06m 58s)
  • 22:14 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2042.codfw.wmnet
  • 22:14 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2042.codfw.wmnet
  • 22:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T370903)', diff saved to https://phabricator.wikimedia.org/P67846 and previous config saved to /var/cache/conftool/dbconfig/20240826-221411-ladsgroup.json
  • 22:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T370903)', diff saved to https://phabricator.wikimedia.org/P67845 and previous config saved to /var/cache/conftool/dbconfig/20240826-221302-ladsgroup.json
  • 22:13 zabe@deploy1003: superpes, zabe: Continuing with sync
  • 22:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 22:12 swfrench-wmf: ran homer 'lsw1-a8-codfw*' commit 'T372878'
  • 22:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 22:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 22:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 22:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T370903)', diff saved to https://phabricator.wikimedia.org/P67844 and previous config saved to /var/cache/conftool/dbconfig/20240826-221245-ladsgroup.json
  • 22:12 zabe@deploy1003: superpes, zabe: Backport for Removing 'spamblacklistlog' right from usergroups (T367683) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:10 zabe@deploy1003: Started scap sync-world: Backport for Removing 'spamblacklistlog' right from usergroups (T367683)
  • 22:10 zabe@deploy1003: Finished scap sync-world: Backport for [sysop_plwiki] Change the logo/icon and the favicon (T368712), [arbcom_itwiki] Enable importing from itwiki (T369264) (duration: 07m 13s)
  • 22:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P67843 and previous config saved to /var/cache/conftool/dbconfig/20240826-220821-ladsgroup.json
  • 22:05 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2042.codfw.wmnet with OS bullseye
  • 22:05 zabe@deploy1003: superpes, zabe: Continuing with sync
  • 22:04 zabe@deploy1003: superpes, zabe: Backport for [sysop_plwiki] Change the logo/icon and the favicon (T368712), [arbcom_itwiki] Enable importing from itwiki (T369264) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:02 zabe@deploy1003: Started scap sync-world: Backport for [sysop_plwiki] Change the logo/icon and the favicon (T368712), [arbcom_itwiki] Enable importing from itwiki (T369264)
  • 22:01 inflatador: bking@dns1004.wikimedia.org `sudo -i authdns-update` T364364
  • 21:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P67842 and previous config saved to /var/cache/conftool/dbconfig/20240826-215738-ladsgroup.json
  • 21:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P67841 and previous config saved to /var/cache/conftool/dbconfig/20240826-215314-ladsgroup.json
  • 21:45 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2042.codfw.wmnet with reason: host reimage
  • 21:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P67840 and previous config saved to /var/cache/conftool/dbconfig/20240826-214230-ladsgroup.json
  • 21:41 swfrench@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2042.codfw.wmnet with reason: host reimage
  • 21:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T371742)', diff saved to https://phabricator.wikimedia.org/P67839 and previous config saved to /var/cache/conftool/dbconfig/20240826-213807-ladsgroup.json
  • 21:31 catrope@deploy1003: Finished scap sync-world: Backport for Revert "Activates the "compact" Parsoid indicator on all wikivoyage wikis" (duration: 31m 21s)
  • 21:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T370903)', diff saved to https://phabricator.wikimedia.org/P67838 and previous config saved to /var/cache/conftool/dbconfig/20240826-212723-ladsgroup.json
  • 21:27 catrope@deploy1003: catrope, trainbranchbot: Continuing with sync
  • 21:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T370903)', diff saved to https://phabricator.wikimedia.org/P67837 and previous config saved to /var/cache/conftool/dbconfig/20240826-212513-ladsgroup.json
  • 21:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 21:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 21:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T370903)', diff saved to https://phabricator.wikimedia.org/P67836 and previous config saved to /var/cache/conftool/dbconfig/20240826-212458-ladsgroup.json
  • 21:24 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7fe3cb9c1700>
  • 21:24 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2042
  • 21:23 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2042
  • 21:23 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2042.codfw.wmnet 20.0.192.10.in-addr.arpa 0.2.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 21:23 swfrench@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-worker2042.codfw.wmnet 20.0.192.10.in-addr.arpa 0.2.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 21:23 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:23 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2042 - swfrench@cumin2002"
  • 21:23 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2042 - swfrench@cumin2002"
  • 21:18 swfrench@cumin2002: START - Cookbook sre.dns.netbox
  • 21:17 swfrench@cumin2002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7fe3cb9c1700>
  • 21:17 swfrench@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2042.codfw.wmnet with OS bullseye
  • 21:16 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2042.codfw.wmnet on all recursors
  • 21:16 swfrench@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-worker2042.codfw.wmnet on all recursors
  • 21:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P67835 and previous config saved to /var/cache/conftool/dbconfig/20240826-210951-ladsgroup.json
  • 21:09 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2025 to wikikube-worker2042
  • 21:08 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2042
  • 21:08 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2042
  • 21:08 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:08 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2025 to wikikube-worker2042 - swfrench@cumin2002"
  • 21:07 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2025 to wikikube-worker2042 - swfrench@cumin2002"
  • 21:02 catrope@deploy1003: catrope, trainbranchbot: Backport for Revert "Activates the "compact" Parsoid indicator on all wikivoyage wikis" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:02 swfrench@cumin2002: START - Cookbook sre.dns.netbox
  • 21:01 swfrench@cumin2002: START - Cookbook sre.hosts.rename from kubernetes2025 to wikikube-worker2042
  • 21:00 catrope@deploy1003: Started scap sync-world: Backport for Revert "Activates the "compact" Parsoid indicator on all wikivoyage wikis"
  • 20:58 catrope@deploy1003: Sync cancelled.
  • 20:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P67834 and previous config saved to /var/cache/conftool/dbconfig/20240826-205443-ladsgroup.json
  • 20:52 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2025.codfw.wmnet
  • 20:51 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2025.codfw.wmnet
  • 20:47 catrope@deploy1003: catrope, cscott: Backport for Activates the "compact" Parsoid indicator on all wikivoyage wikis (T372789) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:44 catrope@deploy1003: Started scap sync-world: Backport for Activates the "compact" Parsoid indicator on all wikivoyage wikis (T372789)
  • 20:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T370903)', diff saved to https://phabricator.wikimedia.org/P67833 and previous config saved to /var/cache/conftool/dbconfig/20240826-203936-ladsgroup.json
  • 20:39 catrope@deploy1003: Finished scap sync-world: Backport for Add Chart extension, enable in beta cluster (T369945) (duration: 29m 57s)
  • 20:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T370903)', diff saved to https://phabricator.wikimedia.org/P67832 and previous config saved to /var/cache/conftool/dbconfig/20240826-203726-ladsgroup.json
  • 20:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 20:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 20:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T370903)', diff saved to https://phabricator.wikimedia.org/P67831 and previous config saved to /var/cache/conftool/dbconfig/20240826-203715-ladsgroup.json
  • 20:28 catrope@deploy1003: catrope: Continuing with sync
  • 20:28 catrope@deploy1003: catrope: Backport for Add Chart extension, enable in beta cluster (T369945) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P67829 and previous config saved to /var/cache/conftool/dbconfig/20240826-202208-ladsgroup.json
  • 20:21 ryankemper@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main
  • 20:20 ryankemper@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-scholarly
  • 20:09 catrope@deploy1003: Started scap sync-world: Backport for Add Chart extension, enable in beta cluster (T369945)
  • 20:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P67827 and previous config saved to /var/cache/conftool/dbconfig/20240826-200701-ladsgroup.json
  • 20:05 kamila@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2041.codfw.wmnet
  • 20:05 kamila@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2041.codfw.wmnet
  • 20:05 kamila_: run homer to add wikikube-worker2041 T372878
  • 19:59 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2041.codfw.wmnet with OS bullseye
  • 19:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T370903)', diff saved to https://phabricator.wikimedia.org/P67826 and previous config saved to /var/cache/conftool/dbconfig/20240826-195153-ladsgroup.json
  • 19:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T370903)', diff saved to https://phabricator.wikimedia.org/P67825 and previous config saved to /var/cache/conftool/dbconfig/20240826-194944-ladsgroup.json
  • 19:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 19:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 19:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T370903)', diff saved to https://phabricator.wikimedia.org/P67824 and previous config saved to /var/cache/conftool/dbconfig/20240826-194933-ladsgroup.json
  • 19:48 ryankemper@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-scholarly
  • 19:48 ryankemper@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main
  • 19:48 ryankemper: T364368 Manually adding dns discovery resources to etcd corresponding to https://wikitech.wikimedia.org/wiki/LVS#Add_the_DNS_Discovery_Record
  • 19:45 ryankemper: T364368 Merged patch to add dns discovery resources for `wdqs-main` and `wdqs-scholarly` (https://gerrit.wikimedia.org/r/c/operations/dns/+/1064831), and ran puppet on all DNS hosts
  • 19:43 ryankemper: T364368 Merged patch to move lvs state to `production` for `wdqs-main` and `wdqs-scholarly` (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1064848) and ran puppet on all LVS hosts
  • 19:42 ryankemper: T364368 [codfw] `sudo ipvsadm -L -n` on lvs primary looks good, all done with lvs restarts
  • 19:39 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2041.codfw.wmnet with reason: host reimage
  • 19:36 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2041.codfw.wmnet with reason: host reimage
  • 19:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P67823 and previous config saved to /var/cache/conftool/dbconfig/20240826-193425-ladsgroup.json
  • 19:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T371742)', diff saved to https://phabricator.wikimedia.org/P67822 and previous config saved to /var/cache/conftool/dbconfig/20240826-193032-ladsgroup.json
  • 19:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 19:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 19:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 19:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 19:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T371742)', diff saved to https://phabricator.wikimedia.org/P67821 and previous config saved to /var/cache/conftool/dbconfig/20240826-193003-ladsgroup.json
  • 19:25 ryankemper: T364368 [codfw] `sudo ipvsadm -L -n` on lvs primary looks good, all done with lvs restarts
  • 19:24 sukhe: sukhe@alert1001:~$ sudo systemctl restart ircecho.service
  • 19:24 ryankemper: T364368 [codfw] Restarted lvs primary: `sudo cumin 'A:lvs-low-traffic-codfw' 'systemctl restart pybal.service'`
  • 19:23 ryankemper: T364368 [codfw] `sudo ipvsadm -L -n` on lvs secondary looks good, proceeding
  • 19:21 ryankemper: T280001 [codfw] Restarted lvs secondary: `sudo cumin 'A:lvs-secondary-codfw' 'systemctl restart pybal.service'`
  • 19:20 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f58347ff5e0>
  • 19:20 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2041
  • 19:20 ryankemper: T280001 [codfw] ran puppet on codfw lvs hosts, expecting alerts soon
  • 19:20 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2041
  • 19:20 kamila@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2041.codfw.wmnet 125.0.192.10.in-addr.arpa 5.2.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 19:20 kamila@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2041.codfw.wmnet 125.0.192.10.in-addr.arpa 5.2.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 19:20 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:20 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2041 - kamila@cumin1002"
  • 19:20 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2041 - kamila@cumin1002"
  • 19:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P67820 and previous config saved to /var/cache/conftool/dbconfig/20240826-191917-ladsgroup.json
  • 19:16 ryankemper: T280001 [eqiad] `sudo ipvsadm -L -n` on lvs primary looks good, proceeding
  • 19:16 ryankemper: T280001 [eqiad] Restarted lvs primary: `sudo cumin 'A:lvs-low-traffic-eqiad' 'systemctl restart pybal.service'`
  • 19:15 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P67819 and previous config saved to /var/cache/conftool/dbconfig/20240826-191456-ladsgroup.json
  • 19:14 kamila@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f58347ff5e0>
  • 19:14 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2041.codfw.wmnet with OS bullseye
  • 19:13 ryankemper: T280001 [eqiad] `sudo ipvsadm -L -n` on lvs secondary looks good, proceeding
  • 19:13 ryankemper: T280001 [eqiad] Restarted lvs secondary: `sudo cumin 'A:lvs-secondary-eqiad' 'systemctl restart pybal.service'`
  • 19:12 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2018 to wikikube-worker2041
  • 19:12 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2041
  • 19:12 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2041
  • 19:11 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:11 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2018 to wikikube-worker2041 - kamila@cumin1002"
  • 19:11 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2018 to wikikube-worker2041 - kamila@cumin1002"
  • 19:07 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 19:07 kamila@cumin1002: START - Cookbook sre.hosts.rename from kubernetes2018 to wikikube-worker2041
  • 19:06 ryankemper: T280001 [eqiad] enabled puppet on eqiad lvs hosts, expecting alerts soon
  • 19:05 ryankemper: T280001 Disabled puppet on all lvs hosts in preparation for rolling restart
  • 19:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T370903)', diff saved to https://phabricator.wikimedia.org/P67818 and previous config saved to /var/cache/conftool/dbconfig/20240826-190411-ladsgroup.json
  • 19:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T370903)', diff saved to https://phabricator.wikimedia.org/P67817 and previous config saved to /var/cache/conftool/dbconfig/20240826-190201-ladsgroup.json
  • 19:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 19:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 19:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 19:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 19:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T370903)', diff saved to https://phabricator.wikimedia.org/P67816 and previous config saved to /var/cache/conftool/dbconfig/20240826-190145-ladsgroup.json
  • 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P67815 and previous config saved to /var/cache/conftool/dbconfig/20240826-185948-ladsgroup.json
  • 18:50 ryankemper@cumin2002: conftool action : set/pooled=no:weight=10; selector: name=wdqs1023*
  • 18:48 cstone: payments-wiki upgraded from 2551f261 to 0455b791
  • 18:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P67814 and previous config saved to /var/cache/conftool/dbconfig/20240826-184638-ladsgroup.json
  • 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T371742)', diff saved to https://phabricator.wikimedia.org/P67813 and previous config saved to /var/cache/conftool/dbconfig/20240826-184441-ladsgroup.json
  • 18:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P67812 and previous config saved to /var/cache/conftool/dbconfig/20240826-183131-ladsgroup.json
  • 18:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T370903)', diff saved to https://phabricator.wikimedia.org/P67811 and previous config saved to /var/cache/conftool/dbconfig/20240826-181624-ladsgroup.json
  • 18:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T370903)', diff saved to https://phabricator.wikimedia.org/P67810 and previous config saved to /var/cache/conftool/dbconfig/20240826-181414-ladsgroup.json
  • 18:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 18:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 18:11 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:09 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 18:09 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 18:08 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 17:53 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 17:52 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 17:52 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 17:51 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 17:43 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: cluster=wdqs-main
  • 17:43 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: cluster=wdqs-scholarly
  • 17:41 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 17:41 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 17:40 kamila@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2018.codfw.wmnet
  • 17:40 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 17:39 kamila@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2018.codfw.wmnet
  • 17:39 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 17:39 ryankemper: T364364 Created PTR & A records for new graph split services `wdqs-main` and `wdqs-scholarly` (merged https://gerrit.wikimedia.org/r/c/operations/dns/+/1051446 and ran `sudo authdns-update` on `dns1004.wikimedia.org`)
  • 17:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 11 hosts with reason: Maintenance
  • 17:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on 11 hosts with reason: Maintenance
  • 17:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 17:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 17:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T370903)', diff saved to https://phabricator.wikimedia.org/P67809 and previous config saved to /var/cache/conftool/dbconfig/20240826-172250-ladsgroup.json
  • 17:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P67808 and previous config saved to /var/cache/conftool/dbconfig/20240826-170742-ladsgroup.json
  • 16:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2035.codfw.wmnet
  • 16:54 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2035.codfw.wmnet
  • 16:53 claime: homer 'lsw1-b8-codfw*' commit T372878
  • 16:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2035.codfw.wmnet with OS bullseye
  • 16:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P67807 and previous config saved to /var/cache/conftool/dbconfig/20240826-165235-ladsgroup.json
  • 16:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T370903)', diff saved to https://phabricator.wikimedia.org/P67806 and previous config saved to /var/cache/conftool/dbconfig/20240826-163728-ladsgroup.json
  • 16:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2035.codfw.wmnet with reason: host reimage
  • 16:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T370903)', diff saved to https://phabricator.wikimedia.org/P67805 and previous config saved to /var/cache/conftool/dbconfig/20240826-163032-ladsgroup.json
  • 16:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 16:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 16:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 16:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 16:29 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2035.codfw.wmnet with reason: host reimage
  • 16:28 claime: homer 'cr*codfw*' commit 'T372878'
  • 16:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 16:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 16:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T370903)', diff saved to https://phabricator.wikimedia.org/P67804 and previous config saved to /var/cache/conftool/dbconfig/20240826-162553-ladsgroup.json
  • 16:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T371742)', diff saved to https://phabricator.wikimedia.org/P67803 and previous config saved to /var/cache/conftool/dbconfig/20240826-162544-ladsgroup.json
  • 16:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 16:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 16:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T371742)', diff saved to https://phabricator.wikimedia.org/P67802 and previous config saved to /var/cache/conftool/dbconfig/20240826-162522-ladsgroup.json
  • 16:13 dancy@deploy1003: Stopping before sync operations
  • 16:13 dancy@deploy1003: Started scap sync-world: testing
  • 16:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P67801 and previous config saved to /var/cache/conftool/dbconfig/20240826-161039-ladsgroup.json
  • 16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f6bc9767d90>
  • 16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2035
  • 16:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P67800 and previous config saved to /var/cache/conftool/dbconfig/20240826-161015-ladsgroup.json
  • 16:10 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2035
  • 16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2035.codfw.wmnet 62.16.192.10.in-addr.arpa 2.6.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 16:10 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2035.codfw.wmnet 62.16.192.10.in-addr.arpa 2.6.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2035 - cgoubert@cumin1002"
  • 16:10 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2035 - cgoubert@cumin1002"
  • 16:06 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 16:05 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f6bc9767d90>
  • 16:04 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2035.codfw.wmnet with OS bullseye
  • 16:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2001.codfw.wmnet
  • 16:03 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2001.codfw.wmnet
  • 16:01 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2001.codfw.wmnet with OS bullseye
  • 16:01 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2001.codfw.wmnet with OS bullseye
  • 15:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2035.codfw.wmnet
  • 15:57 jdrewniak@deploy1003: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 02m 14s)
  • 15:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2001.codfw.wmnet
  • 15:57 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2035.codfw.wmnet
  • 15:56 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2001.codfw.wmnet
  • 15:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P67799 and previous config saved to /var/cache/conftool/dbconfig/20240826-155531-ladsgroup.json
  • 15:55 jdrewniak@deploy1003: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 09m 39s)
  • 15:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P67798 and previous config saved to /var/cache/conftool/dbconfig/20240826-155507-ladsgroup.json
  • 15:47 sukhe: finished upgrading A:cp-eqsin to ATS 9.2.5: T339134
  • 15:47 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-eqsin and A:cp for 9.2.5-1wm2
  • 15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T370903)', diff saved to https://phabricator.wikimedia.org/P67797 and previous config saved to /var/cache/conftool/dbconfig/20240826-154024-ladsgroup.json
  • 15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T371742)', diff saved to https://phabricator.wikimedia.org/P67796 and previous config saved to /var/cache/conftool/dbconfig/20240826-154000-ladsgroup.json
  • 15:37 jan_drewniak: starting Wikimedia Portals Update. https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1066804
  • 15:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T370903)', diff saved to https://phabricator.wikimedia.org/P67795 and previous config saved to /var/cache/conftool/dbconfig/20240826-153415-ladsgroup.json
  • 15:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 15:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2014.codfw.wmnet
  • 15:29 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2014.codfw.wmnet
  • 15:28 claime: homer 'lsw1-a5-codfw*' commit 'T372878'
  • 15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2014.codfw.wmnet with OS bullseye
  • 15:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 15:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 15:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T370903)', diff saved to https://phabricator.wikimedia.org/P67794 and previous config saved to /var/cache/conftool/dbconfig/20240826-152715-ladsgroup.json
  • 15:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P67793 and previous config saved to /var/cache/conftool/dbconfig/20240826-151207-ladsgroup.json
  • 15:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2014.codfw.wmnet with reason: host reimage
  • 15:06 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2014.codfw.wmnet with reason: host reimage
  • 15:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2008.codfw.wmnet
  • 15:03 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2008.codfw.wmnet
  • 15:02 claime: homer 'lsw1-b6-codfw*' commit T372878
  • 15:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host rpki2003.codfw.wmnet
  • 15:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rpki2003.codfw.wmnet with OS bookworm
  • 15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2008.codfw.wmnet with OS bullseye
  • 14:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P67792 and previous config saved to /var/cache/conftool/dbconfig/20240826-145700-ladsgroup.json
  • 14:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2034.codfw.wmnet
  • 14:56 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2034.codfw.wmnet
  • 14:55 claime: homer 'lsw1-a3-codfw*' commit T372878
  • 14:54 claime: homer 'lsw-a3-codfw*' commit T372878
  • 14:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2034.codfw.wmnet with OS bullseye
  • 14:50 claime: Running homer 'cr*codfw*' commit 'T372878'
  • 14:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f15affd4d00>
  • 14:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2014
  • 14:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2004.codfw.wmnet
  • 14:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2004.codfw.wmnet
  • 14:49 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2014
  • 14:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2014.codfw.wmnet 70.0.192.10.in-addr.arpa 0.7.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:49 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2014.codfw.wmnet 70.0.192.10.in-addr.arpa 0.7.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2014 - cgoubert@cumin1002"
  • 14:49 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2014 - cgoubert@cumin1002"
  • 14:47 claime: homer 'lsw1-b3-codfw*' commit T372878
  • 14:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2004.codfw.wmnet with OS bullseye
  • 14:45 dancy@deploy1003: Installation of scap version "4.100.0" completed for 211 hosts
  • 14:45 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:44 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f15affd4d00>
  • 14:44 dancy@deploy1003: Installing scap version "4.100.0" for 211 hosts
  • 14:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2014.codfw.wmnet with OS bullseye
  • 14:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2014.codfw.wmnet
  • 14:41 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2014.codfw.wmnet
  • 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T370903)', diff saved to https://phabricator.wikimedia.org/P67791 and previous config saved to /var/cache/conftool/dbconfig/20240826-144153-ladsgroup.json
  • 14:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2013.codfw.wmnet
  • 14:41 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2013.codfw.wmnet
  • 14:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2008.codfw.wmnet with reason: host reimage
  • 14:40 claime: homer 'lsw1-a5-codfw*' commit 'T372878'
  • 14:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2013.codfw.wmnet with OS bullseye
  • 14:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T370903)', diff saved to https://phabricator.wikimedia.org/P67790 and previous config saved to /var/cache/conftool/dbconfig/20240826-143844-ladsgroup.json
  • 14:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 14:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 14:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T370903)', diff saved to https://phabricator.wikimedia.org/P67789 and previous config saved to /var/cache/conftool/dbconfig/20240826-143822-ladsgroup.json
  • 14:37 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2008.codfw.wmnet with reason: host reimage
  • 14:36 Dreamy_Jazz: Started 6hr maximum scan on group2 - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 14:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2034.codfw.wmnet with reason: host reimage
  • 14:31 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2034.codfw.wmnet with reason: host reimage
  • 14:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2004.codfw.wmnet with reason: host reimage
  • 14:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2004.codfw.wmnet with reason: host reimage
  • 14:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P67788 and previous config saved to /var/cache/conftool/dbconfig/20240826-142315-ladsgroup.json
  • 14:21 claime: Running homer 'cr*codfw*' commit 'T372878'
  • 14:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f33b17ddd90>
  • 14:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2008
  • 14:20 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2008
  • 14:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2008.codfw.wmnet 196.16.192.10.in-addr.arpa 6.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:20 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2008.codfw.wmnet 196.16.192.10.in-addr.arpa 6.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2008 - cgoubert@cumin1002"
  • 14:20 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2008 - cgoubert@cumin1002"
  • 14:20 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3003.esams.wmnet
  • 14:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2013.codfw.wmnet with reason: host reimage
  • 14:17 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:17 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f33b17ddd90>
  • 14:16 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2008.codfw.wmnet with OS bullseye
  • 14:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2008.codfw.wmnet
  • 14:03 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:03 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f13a466bd60>
  • 14:02 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2004.codfw.wmnet with OS bullseye
  • 14:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2004.codfw.wmnet
  • 14:00 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2004.codfw.wmnet
  • 14:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2004.codfw.wmnet
  • 14:00 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2004.codfw.wmnet
  • 13:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7fc4fbcc0d30>
  • 13:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2013
  • 13:59 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2013
  • 13:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2013.codfw.wmnet 68.0.192.10.in-addr.arpa 8.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:59 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2013.codfw.wmnet 68.0.192.10.in-addr.arpa 8.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2013 - cgoubert@cumin1002"
  • 13:56 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2013 - cgoubert@cumin1002"
  • 13:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rpki2003.codfw.wmnet with reason: host reimage
  • 13:53 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 13:53 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7fc4fbcc0d30>
  • 13:53 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2013.codfw.wmnet with OS bullseye
  • 13:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T370903)', diff saved to https://phabricator.wikimedia.org/P67786 and previous config saved to /var/cache/conftool/dbconfig/20240826-135301-ladsgroup.json
  • 13:52 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on rpki2003.codfw.wmnet with reason: host reimage
  • 13:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2013.codfw.wmnet
  • 13:51 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2013.codfw.wmnet
  • 13:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T370903)', diff saved to https://phabricator.wikimedia.org/P67785 and previous config saved to /var/cache/conftool/dbconfig/20240826-135052-ladsgroup.json
  • 13:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T370903)', diff saved to https://phabricator.wikimedia.org/P67784 and previous config saved to /var/cache/conftool/dbconfig/20240826-135031-ladsgroup.json
  • 13:45 urbanecm@deploy1003: Finished scap sync-world: Backport for use shellbox-video globally (adding group2, including commons) (T356241) (duration: 08m 04s)
  • 13:45 Dreamy_Jazz: Started 6hr maximum scan on nowiki - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 13:41 urbanecm@deploy1003: hnowlan, urbanecm: Continuing with sync
  • 13:40 urbanecm@deploy1003: hnowlan, urbanecm: Backport for use shellbox-video globally (adding group2, including commons) (T356241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:37 urbanecm@deploy1003: Started scap sync-world: Backport for use shellbox-video globally (adding group2, including commons) (T356241)
  • 13:36 urbanecm@deploy1003: Finished scap sync-world: Backport for Rollout Parsoid Kartographer support on all wikis (T342871), scripts: add script for running jobs from stdin rather than http (T369048) (duration: 26m 53s)
  • 13:35 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host rpki2003.codfw.wmnet with OS bookworm
  • 13:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P67783 and previous config saved to /var/cache/conftool/dbconfig/20240826-133524-ladsgroup.json
  • 13:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM rpki2003.codfw.wmnet - ayounsi@cumin1002"
  • 13:34 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM rpki2003.codfw.wmnet - ayounsi@cumin1002"
  • 13:34 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-eqsin and A:cp for 9.2.5-1wm2
  • 13:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) rpki2003.codfw.wmnet on all recursors
  • 13:34 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache rpki2003.codfw.wmnet on all recursors
  • 13:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM rpki2003.codfw.wmnet - ayounsi@cumin1002"
  • 13:34 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM rpki2003.codfw.wmnet - ayounsi@cumin1002"
  • 13:30 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 13:30 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host rpki2003.codfw.wmnet
  • 13:29 ayounsi@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host rpki2003.codfw.wmnet
  • 13:29 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host rpki2003.codfw.wmnet
  • 13:28 urbanecm@deploy1003: hnowlan, urbanecm, ihurbain: Continuing with sync
  • 13:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T371742)', diff saved to https://phabricator.wikimedia.org/P67782 and previous config saved to /var/cache/conftool/dbconfig/20240826-132738-ladsgroup.json
  • 13:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 13:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 13:24 urbanecm@deploy1003: hnowlan, urbanecm, ihurbain: Backport for Rollout Parsoid Kartographer support on all wikis (T342871), scripts: add script for running jobs from stdin rather than http (T369048) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P67781 and previous config saved to /var/cache/conftool/dbconfig/20240826-132016-ladsgroup.json
  • 13:09 urbanecm@deploy1003: Started scap sync-world: Backport for Rollout Parsoid Kartographer support on all wikis (T342871), scripts: add script for running jobs from stdin rather than http (T369048)
  • 13:07 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 13:06 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 13:06 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:05 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T370903)', diff saved to https://phabricator.wikimedia.org/P67780 and previous config saved to /var/cache/conftool/dbconfig/20240826-130510-ladsgroup.json
  • 13:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T370903)', diff saved to https://phabricator.wikimedia.org/P67779 and previous config saved to /var/cache/conftool/dbconfig/20240826-130401-ladsgroup.json
  • 13:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 13:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 13:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T370903)', diff saved to https://phabricator.wikimedia.org/P67778 and previous config saved to /var/cache/conftool/dbconfig/20240826-130350-ladsgroup.json
  • 12:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P67777 and previous config saved to /var/cache/conftool/dbconfig/20240826-124843-ladsgroup.json
  • 12:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P67776 and previous config saved to /var/cache/conftool/dbconfig/20240826-123336-ladsgroup.json
  • 12:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Weight db2214 T373174', diff saved to https://phabricator.wikimedia.org/P67775 and previous config saved to /var/cache/conftool/dbconfig/20240826-123205-arnaudb.json
  • 12:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db2129 to s6 primary T373174', diff saved to https://phabricator.wikimedia.org/P67774 and previous config saved to /var/cache/conftool/dbconfig/20240826-122925-arnaudb.json
  • 12:28 arnaudb: Starting s6 codfw failover from db2214 to db2129 - T373174
  • 12:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: Testing
  • 12:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: Testing
  • 12:21 godog: move to /root unused and about to expire cert on puppetmaster1001:/var/lib/puppet/server/ssl/ca/signed/webperf.discovery.wmnet.pem
  • 12:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T370903)', diff saved to https://phabricator.wikimedia.org/P67773 and previous config saved to /var/cache/conftool/dbconfig/20240826-121828-ladsgroup.json
  • 12:18 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268434
  • 12:17 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 268434
  • 12:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 263903
  • 12:17 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 263903
  • 12:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61754
  • 12:17 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61754
  • 12:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 269115
  • 12:16 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 269115
  • 12:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 274607
  • 12:16 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 274607
  • 12:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T370903)', diff saved to https://phabricator.wikimedia.org/P67772 and previous config saved to /var/cache/conftool/dbconfig/20240826-121419-ladsgroup.json
  • 12:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 12:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 12:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T370903)', diff saved to https://phabricator.wikimedia.org/P67771 and previous config saved to /var/cache/conftool/dbconfig/20240826-121408-ladsgroup.json
  • 12:12 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 12:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db2129 with weight 0 T373174', diff saved to https://phabricator.wikimedia.org/P67770 and previous config saved to /var/cache/conftool/dbconfig/20240826-120921-arnaudb.json
  • 12:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s6 T373174
  • 12:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s6 T373174
  • 12:05 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P67769 and previous config saved to /var/cache/conftool/dbconfig/20240826-115901-ladsgroup.json
  • 11:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 11:53 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 11:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P67768 and previous config saved to /var/cache/conftool/dbconfig/20240826-114354-ladsgroup.json
  • 11:41 hashar@deploy1003: Finished deploy [integration/docroot@c3352dd]: build: update mediawiki/mediawiki-codesniffer to 44.0.0 and micromatch to 4.0.8 (duration: 00m 06s)
  • 11:41 hashar@deploy1003: Started deploy [integration/docroot@c3352dd]: build: update mediawiki/mediawiki-codesniffer to 44.0.0 and micromatch to 4.0.8
  • 11:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 11:29 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 11:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T370903)', diff saved to https://phabricator.wikimedia.org/P67767 and previous config saved to /var/cache/conftool/dbconfig/20240826-112847-ladsgroup.json
  • 11:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T370903)', diff saved to https://phabricator.wikimedia.org/P67766 and previous config saved to /var/cache/conftool/dbconfig/20240826-112739-ladsgroup.json
  • 11:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 11:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 11:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 11:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 11:16 vgutierrez@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
  • 11:13 vgutierrez@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
  • 10:46 Dreamy_Jazz: Started a maximum 6 hr scan on ruwiki for MediaModeration - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 10:39 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 10:00 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 09:59 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 09:47 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 09:45 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:43 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 09:42 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 09:42 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 09:40 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 09:37 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp-test1003.wikimedia.org
  • 09:37 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:37 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
  • 09:36 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
  • 09:33 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
  • 09:28 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts idp-test1003.wikimedia.org
  • 09:27 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp1003.wikimedia.org
  • 09:27 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:27 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
  • 09:25 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
  • 09:22 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
  • 09:17 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts idp1003.wikimedia.org
  • 08:56 arnaudb@cumin1002: dbctl commit (dc=all): 'weight db2212 T373173', diff saved to https://phabricator.wikimedia.org/P67763 and previous config saved to /var/cache/conftool/dbconfig/20240826-085621-arnaudb.json
  • 08:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db2203 to s1 primary T373173', diff saved to https://phabricator.wikimedia.org/P67762 and previous config saved to /var/cache/conftool/dbconfig/20240826-085048-arnaudb.json
  • 08:50 arnaudb: Starting s1 codfw failover from db2212 to db2203 - T373173
  • 08:49 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp2003.wikimedia.org
  • 08:49 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:49 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
  • 08:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T373173 - repeat due to T373295
  • 08:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T373173 - repeat due to T373295
  • 08:48 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
  • 08:45 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
  • 08:40 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts idp2003.wikimedia.org
  • 08:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Primary switchover s1 node in failure
  • 08:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Primary switchover s1 node in failure
  • 08:17 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 depool', diff saved to https://phabricator.wikimedia.org/P67760 and previous config saved to /var/cache/conftool/dbconfig/20240826-081753-arnaudb.json
  • 07:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db2203 with weight 0 T373173', diff saved to https://phabricator.wikimedia.org/P67758 and previous config saved to /var/cache/conftool/dbconfig/20240826-074113-arnaudb.json
  • 07:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T373173
  • 07:40 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T373173
  • 07:21 arnaudb@cumin1002: dbctl commit (dc=all): 'rebalance weights T373168', diff saved to https://phabricator.wikimedia.org/P67757 and previous config saved to /var/cache/conftool/dbconfig/20240826-072119-arnaudb.json
  • 07:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote es2039 to es7 primary and set section read-write T373168', diff saved to https://phabricator.wikimedia.org/P67756 and previous config saved to /var/cache/conftool/dbconfig/20240826-072028-arnaudb.json
  • 07:19 arnaudb: Starting es7 codfw failover from es2038 to es2039 - T373168
  • 07:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Set es2039 with weight 0 T373168', diff saved to https://phabricator.wikimedia.org/P67755 and previous config saved to /var/cache/conftool/dbconfig/20240826-071504-arnaudb.json
  • 07:14 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es7 T373168
  • 07:14 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es7 T373168
  • 06:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32934
  • 06:08 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 32934

2024-08-25

  • 15:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T367856)', diff saved to https://phabricator.wikimedia.org/P67754 and previous config saved to /var/cache/conftool/dbconfig/20240825-153206-marostegui.json
  • 15:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 7:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 15:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 7:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 15:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T367856)', diff saved to https://phabricator.wikimedia.org/P67753 and previous config saved to /var/cache/conftool/dbconfig/20240825-153144-marostegui.json
  • 15:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P67752 and previous config saved to /var/cache/conftool/dbconfig/20240825-151637-marostegui.json
  • 15:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P67751 and previous config saved to /var/cache/conftool/dbconfig/20240825-150130-marostegui.json
  • 14:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T367856)', diff saved to https://phabricator.wikimedia.org/P67750 and previous config saved to /var/cache/conftool/dbconfig/20240825-144623-marostegui.json
  • 08:05 oblivian@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Replication fixed', diff saved to https://phabricator.wikimedia.org/P67749 and previous config saved to /var/cache/conftool/dbconfig/20240825-080544-oblivian.json
  • 07:50 oblivian@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Replication fixed', diff saved to https://phabricator.wikimedia.org/P67748 and previous config saved to /var/cache/conftool/dbconfig/20240825-075038-oblivian.json
  • 07:35 oblivian@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Replication fixed', diff saved to https://phabricator.wikimedia.org/P67747 and previous config saved to /var/cache/conftool/dbconfig/20240825-073533-oblivian.json
  • 07:20 oblivian@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Replication fixed', diff saved to https://phabricator.wikimedia.org/P67746 and previous config saved to /var/cache/conftool/dbconfig/20240825-072027-oblivian.json
  • 07:05 oblivian@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Replication fixed', diff saved to https://phabricator.wikimedia.org/P67745 and previous config saved to /var/cache/conftool/dbconfig/20240825-070522-oblivian.json
  • 06:57 _joe_: repairing mgwiktionary.pagelinks on db1161
  • 06:12 oblivian@cumin1002: dbctl commit (dc=all): 'depooling db1161, broken replica', diff saved to https://phabricator.wikimedia.org/P67744 and previous config saved to /var/cache/conftool/dbconfig/20240825-061206-oblivian.json

2024-08-24

2024-08-23

  • 22:26 eileen: civicrm upgraded from e629834c to 75c86184 (that didn't turn out to have anything relevant to the new deduper error)
  • 16:50 conniecc1@deploy1003: Finished deploy [airflow-dags/analytics_product@c55c7de]: (no justification provided) (duration: 00m 03s)
  • 16:50 conniecc1@deploy1003: Started deploy [airflow-dags/analytics_product@c55c7de]: (no justification provided)
  • 16:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T371742)', diff saved to https://phabricator.wikimedia.org/P67740 and previous config saved to /var/cache/conftool/dbconfig/20240823-164554-ladsgroup.json
  • 16:45 nettrom@deploy1003: Finished deploy [airflow-dags/analytics_product@c55c7de]: (no justification provided) (duration: 00m 17s)
  • 16:45 nettrom@deploy1003: Started deploy [airflow-dags/analytics_product@c55c7de]: (no justification provided)
  • 16:37 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating mgmt for frack servers in codfw - jhancock@cumin2002"
  • 16:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating mgmt for frack servers in codfw - jhancock@cumin2002"
  • 16:34 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:33 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 16:32 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 16:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P67738 and previous config saved to /var/cache/conftool/dbconfig/20240823-163047-ladsgroup.json
  • 16:19 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1001.eqiad.wmnet with OS bookworm
  • 16:16 bearloga@deploy1003: Finished deploy [airflow-dags/wmde@c55c7de]: (no justification provided) (duration: 00m 06s)
  • 16:16 bearloga@deploy1003: Started deploy [airflow-dags/wmde@c55c7de]: (no justification provided)
  • 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P67737 and previous config saved to /var/cache/conftool/dbconfig/20240823-161540-ladsgroup.json
  • 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T371742)', diff saved to https://phabricator.wikimedia.org/P67736 and previous config saved to /var/cache/conftool/dbconfig/20240823-160033-ladsgroup.json
  • 15:59 claime: Running homer 'cr*codfw*' commit T372878
  • 15:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2027.codfw.wmnet
  • 15:59 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2027.codfw.wmnet
  • 15:54 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
  • 15:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 17:00:00 on wdqs[1023-1024].eqiad.wmnet with reason: noisy alerts related to graph split T337013
  • 15:52 bking@cumin2002: START - Cookbook sre.hosts.downtime for 17:00:00 on wdqs[1023-1024].eqiad.wmnet with reason: noisy alerts related to graph split T337013
  • 15:52 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
  • 15:38 claime: Running homer 'lsw1-a6-codfw*' commit T372878
  • 15:35 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 15:35 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 15:33 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bookworm
  • 15:32 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1001.eqiad.wmnet with OS bookworm
  • 15:29 jgleeson: updated civicrm from 975fc66e to e629834c
  • 15:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T367856)', diff saved to https://phabricator.wikimedia.org/P67735 and previous config saved to /var/cache/conftool/dbconfig/20240823-151730-marostegui.json
  • 15:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 14:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 15:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 14:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 15:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 7:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 15:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 7:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 15:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T367856)', diff saved to https://phabricator.wikimedia.org/P67734 and previous config saved to /var/cache/conftool/dbconfig/20240823-151704-marostegui.json
  • 15:11 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bookworm
  • 15:09 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1002.eqiad.wmnet with OS bookworm
  • 15:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P67733 and previous config saved to /var/cache/conftool/dbconfig/20240823-150156-marostegui.json
  • 14:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P67732 and previous config saved to /var/cache/conftool/dbconfig/20240823-144649-marostegui.json
  • 14:45 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1002.eqiad.wmnet with reason: host reimage
  • 14:42 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1002.eqiad.wmnet with reason: host reimage
  • 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T370903)', diff saved to https://phabricator.wikimedia.org/P67731 and previous config saved to /var/cache/conftool/dbconfig/20240823-143952-ladsgroup.json
  • 14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2027.codfw.wmnet with OS bullseye
  • 14:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T367856)', diff saved to https://phabricator.wikimedia.org/P67730 and previous config saved to /var/cache/conftool/dbconfig/20240823-143140-marostegui.json
  • 14:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P67729 and previous config saved to /var/cache/conftool/dbconfig/20240823-142445-ladsgroup.json
  • 14:22 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bookworm
  • 14:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T371742)', diff saved to https://phabricator.wikimedia.org/P67728 and previous config saved to /var/cache/conftool/dbconfig/20240823-141841-ladsgroup.json
  • 14:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2007.codfw.wmnet
  • 14:18 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2007.codfw.wmnet
  • 14:18 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 14:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 14:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T371742)', diff saved to https://phabricator.wikimedia.org/P67727 and previous config saved to /var/cache/conftool/dbconfig/20240823-141819-ladsgroup.json
  • 14:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P67725 and previous config saved to /var/cache/conftool/dbconfig/20240823-140312-ladsgroup.json
  • 14:01 claime: Running homer 'cr*codfw*' commit 'T372878'
  • 13:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f3c3b32f220>
  • 13:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2027
  • 13:57 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2007.codfw.wmnet with reason: host reimage
  • 13:55 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2027
  • 13:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2027.codfw.wmnet 176.0.192.10.in-addr.arpa 6.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:55 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2027.codfw.wmnet 176.0.192.10.in-addr.arpa 6.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2027 - cgoubert@cumin1002"
  • 13:55 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2027 - cgoubert@cumin1002"
  • 13:54 stran@deploy1003: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 13:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T370903)', diff saved to https://phabricator.wikimedia.org/P67724 and previous config saved to /var/cache/conftool/dbconfig/20240823-135431-ladsgroup.json
  • 13:54 milimetric@deploy1003: Finished deploy [analytics/refinery@e5d0d48] (thin): Special deploy to make sure sqoop logic matches schema change (duration: 04m 48s)
  • 13:54 stran@deploy1003: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 13:53 stran@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 13:53 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2007.codfw.wmnet with reason: host reimage
  • 13:52 stran@deploy1003: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 13:52 stran@deploy1003: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 13:52 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 13:51 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f3c3b32f220>
  • 13:51 stran@deploy1003: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 13:51 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2027.codfw.wmnet with OS bullseye
  • 13:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2027.codfw.wmnet
  • 13:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2027.codfw.wmnet
  • 13:49 milimetric@deploy1003: Started deploy [analytics/refinery@e5d0d48] (thin): Special deploy to make sure sqoop logic matches schema change
  • 13:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P67723 and previous config saved to /var/cache/conftool/dbconfig/20240823-134805-ladsgroup.json
  • 13:42 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:37 milimetric@deploy1003: Finished deploy [analytics/refinery@e5d0d48]: Special deploy to make sure sqoop logic matches schema change (duration: 07m 22s)
  • 13:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7ffa0cd98d60>
  • 13:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2007
  • 13:35 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2007
  • 13:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2007.codfw.wmnet 195.16.192.10.in-addr.arpa 5.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:34 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2007.codfw.wmnet 195.16.192.10.in-addr.arpa 5.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2007 - cgoubert@cumin1002"
  • 13:34 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2007 - cgoubert@cumin1002"
  • 13:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T371742)', diff saved to https://phabricator.wikimedia.org/P67722 and previous config saved to /var/cache/conftool/dbconfig/20240823-133258-ladsgroup.json
  • 13:32 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on kafka-main2001.codfw.wmnet with reason: Decom next week
  • 13:32 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on kafka-main2001.codfw.wmnet with reason: Decom next week
  • 13:31 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 13:31 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7ffa0cd98d60>
  • 13:30 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2007.codfw.wmnet with OS bullseye
  • 13:30 milimetric@deploy1003: Started deploy [analytics/refinery@e5d0d48]: Special deploy to make sure sqoop logic matches schema change
  • 13:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2007.codfw.wmnet
  • 13:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T370903)', diff saved to https://phabricator.wikimedia.org/P67721 and previous config saved to /var/cache/conftool/dbconfig/20240823-132838-ladsgroup.json
  • 13:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 13:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 13:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T370903)', diff saved to https://phabricator.wikimedia.org/P67720 and previous config saved to /var/cache/conftool/dbconfig/20240823-132804-ladsgroup.json
  • 13:25 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2007.codfw.wmnet
  • 13:21 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1003.eqiad.wmnet with OS bookworm
  • 13:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2033.codfw.wmnet
  • 13:18 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2033.codfw.wmnet
  • 13:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host wikikube-worker2033.codfw.wmnet
  • 13:17 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-worker2033.codfw.wmnet
  • 13:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P67719 and previous config saved to /var/cache/conftool/dbconfig/20240823-131257-ladsgroup.json
  • 13:10 claime: Running homer 'cr*codfw*' commit 'T372878'
  • 13:09 milimetric@deploy1003: Finished deploy [analytics/refinery@e5d0d48]: Special deploy to make sure sqoop logic matches schema change (duration: 01m 57s)
  • 13:09 claime: Running homer 'lsw1-a3-codfw*' commit 'T372878'
  • 13:07 milimetric@deploy1003: Started deploy [analytics/refinery@e5d0d48]: Special deploy to make sure sqoop logic matches schema change
  • 12:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P67718 and previous config saved to /var/cache/conftool/dbconfig/20240823-125750-ladsgroup.json
  • 12:57 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1003.eqiad.wmnet with reason: host reimage
  • 12:54 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1003.eqiad.wmnet with reason: host reimage
  • 12:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T370903)', diff saved to https://phabricator.wikimedia.org/P67717 and previous config saved to /var/cache/conftool/dbconfig/20240823-124243-ladsgroup.json
  • 12:39 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:39 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:34 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1003.eqiad.wmnet with OS bookworm
  • 12:32 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 12:31 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 12:31 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1004.eqiad.wmnet with OS bookworm
  • 12:20 arnaudb@cumin1002: END (ERROR) - Cookbook sre.switchdc.databases.prepare (exit_code=97) for the (test) switch
  • 12:20 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
  • 12:17 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the (test) switch
  • 12:17 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
  • 12:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T370903)', diff saved to https://phabricator.wikimedia.org/P67716 and previous config saved to /var/cache/conftool/dbconfig/20240823-121653-ladsgroup.json
  • 12:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 12:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 12:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T370903)', diff saved to https://phabricator.wikimedia.org/P67715 and previous config saved to /var/cache/conftool/dbconfig/20240823-121631-ladsgroup.json
  • 12:16 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the (test) switch
  • 12:16 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
  • 12:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2033.codfw.wmnet with OS bullseye
  • 12:08 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
  • 12:04 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
  • 12:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P67714 and previous config saved to /var/cache/conftool/dbconfig/20240823-120124-ladsgroup.json
  • 11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2033.codfw.wmnet with reason: host reimage
  • 11:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T371742)', diff saved to https://phabricator.wikimedia.org/P67713 and previous config saved to /var/cache/conftool/dbconfig/20240823-115358-ladsgroup.json
  • 11:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 11:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T371742)', diff saved to https://phabricator.wikimedia.org/P67712 and previous config saved to /var/cache/conftool/dbconfig/20240823-115336-ladsgroup.json
  • 11:52 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2033.codfw.wmnet with reason: host reimage
  • 11:48 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the (test) switch from test-s1 to test-s1
  • 11:48 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch from test-s1 to test-s1
  • 11:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P67711 and previous config saved to /var/cache/conftool/dbconfig/20240823-114616-ladsgroup.json
  • 11:44 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1004.eqiad.wmnet with OS bookworm
  • 11:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P67710 and previous config saved to /var/cache/conftool/dbconfig/20240823-113829-ladsgroup.json
  • 11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7fa9a1bb7d00>
  • 11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2033
  • 11:35 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2033
  • 11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2033.codfw.wmnet 55.0.192.10.in-addr.arpa 5.5.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 11:35 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2033.codfw.wmnet 55.0.192.10.in-addr.arpa 5.5.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2033 - cgoubert@cumin1002"
  • 11:35 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2033 - cgoubert@cumin1002"
  • 11:32 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 11:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2003.codfw.wmnet
  • 11:32 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2003.codfw.wmnet
  • 11:31 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7fa9a1bb7d00>
  • 11:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T370903)', diff saved to https://phabricator.wikimedia.org/P67709 and previous config saved to /var/cache/conftool/dbconfig/20240823-113109-ladsgroup.json
  • 11:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2033.codfw.wmnet with OS bullseye
  • 11:30 claime: Running homer 'lsw1-b3-codfw*' commit 'T372878'
  • 11:28 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1004.eqiad.wmnet with OS bookworm
  • 11:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2003.codfw.wmnet with OS bullseye
  • 11:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2033.codfw.wmnet
  • 11:28 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2033.codfw.wmnet
  • 11:27 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the (test) switch from test-s1 to test-s1
  • 11:27 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch from test-s1 to test-s1
  • 11:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2026.codfw.wmnet
  • 11:27 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2026.codfw.wmnet
  • 11:23 claime: Running homer 'lsw1-a3-codfw*' commit 'T372878'
  • 11:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P67708 and previous config saved to /var/cache/conftool/dbconfig/20240823-112320-ladsgroup.json
  • 11:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 11:16 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1004.eqiad.wmnet with OS bookworm
  • 11:16 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1004.eqiad.wmnet with OS bookworm
  • 11:16 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the (test) switch from test-s1 to test-s1
  • 11:16 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch from test-s1 to test-s1
  • 11:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2003.codfw.wmnet with reason: host reimage
  • 11:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T371742)', diff saved to https://phabricator.wikimedia.org/P67707 and previous config saved to /var/cache/conftool/dbconfig/20240823-110813-ladsgroup.json
  • 11:07 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 11:07 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1004.eqiad.wmnet with OS bookworm
  • 11:05 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2003.codfw.wmnet with reason: host reimage
  • 11:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1005.eqiad.wmnet with OS bookworm
  • 11:05 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1002"
  • 11:03 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1002"
  • 11:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
  • 11:01 claime: running homer 'cr*codfw*' commit T372878
  • 10:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T370903)', diff saved to https://phabricator.wikimedia.org/P67706 and previous config saved to /var/cache/conftool/dbconfig/20240823-105938-ladsgroup.json
  • 10:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 10:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 10:58 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
  • 10:56 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 10:55 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 10:55 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 10:55 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:54 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 10:54 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2012.codfw.wmnet
  • 10:53 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2012.codfw.wmnet
  • 10:53 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:53 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 10:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f0f758a2d30>
  • 10:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2003
  • 10:47 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2003
  • 10:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2003.codfw.wmnet 177.16.192.10.in-addr.arpa 7.7.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 10:46 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2003.codfw.wmnet 177.16.192.10.in-addr.arpa 7.7.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 10:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2003 - cgoubert@cumin1002"
  • 10:46 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2003 - cgoubert@cumin1002"
  • 10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2012.codfw.wmnet with OS bullseye
  • 10:42 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:42 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f0f758a2d30>
  • 10:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2003.codfw.wmnet with OS bullseye
  • 10:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f0fc17f5d00>
  • 10:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2026
  • 10:41 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2026
  • 10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2026.codfw.wmnet 170.0.192.10.in-addr.arpa 0.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2026.codfw.wmnet 170.0.192.10.in-addr.arpa 0.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2026 - cgoubert@cumin1002"
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2026 - cgoubert@cumin1002"
  • 10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2003.codfw.wmnet
  • 10:40 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
  • 10:39 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2003.codfw.wmnet
  • 10:37 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
  • 10:36 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:36 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f0fc17f5d00>
  • 10:36 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 10:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2026.codfw.wmnet
  • 10:35 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2026.codfw.wmnet
  • 10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2025.codfw.wmnet
  • 10:34 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2025.codfw.wmnet
  • 10:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2025.codfw.wmnet with OS bullseye
  • 10:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 10:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T370903)', diff saved to https://phabricator.wikimedia.org/P67705 and previous config saved to /var/cache/conftool/dbconfig/20240823-103006-ladsgroup.json
  • 10:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2012.codfw.wmnet with reason: host reimage
  • 10:22 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2012.codfw.wmnet with reason: host reimage
  • 10:19 btullis@cumin1002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:datahubsearch
  • 10:17 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
  • 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P67704 and previous config saved to /var/cache/conftool/dbconfig/20240823-101459-ladsgroup.json
  • 10:14 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1005.eqiad.wmnet with OS bookworm
  • 10:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2025.codfw.wmnet with reason: host reimage
  • 10:09 btullis@cumin1002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch
  • 10:07 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2025.codfw.wmnet with reason: host reimage
  • 10:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f00474004c0>
  • 10:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2012
  • 10:06 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2012
  • 10:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2012.codfw.wmnet 67.0.192.10.in-addr.arpa 7.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 10:06 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2012.codfw.wmnet 67.0.192.10.in-addr.arpa 7.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 10:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2012 - cgoubert@cumin1002"
  • 10:06 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2012 - cgoubert@cumin1002"
  • 10:04 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
  • 10:04 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1005.eqiad.wmnet with OS bookworm
  • 10:01 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:01 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f00474004c0>
  • 10:00 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2012.codfw.wmnet with OS bullseye
  • 10:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2012.codfw.wmnet
  • 09:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P67702 and previous config saved to /var/cache/conftool/dbconfig/20240823-095952-ladsgroup.json
  • 09:59 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2012.codfw.wmnet
  • 09:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f4f1d59cdf0>
  • 09:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2025
  • 09:50 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2025
  • 09:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2025.codfw.wmnet 168.0.192.10.in-addr.arpa 8.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:49 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2025.codfw.wmnet 168.0.192.10.in-addr.arpa 8.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2025 - cgoubert@cumin1002"
  • 09:49 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2025 - cgoubert@cumin1002"
  • 09:49 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
  • 09:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T370903)', diff saved to https://phabricator.wikimedia.org/P67701 and previous config saved to /var/cache/conftool/dbconfig/20240823-094445-ladsgroup.json
  • 09:42 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1005.eqiad.wmnet with OS bookworm
  • 09:39 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 09:39 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f4f1d59cdf0>
  • 09:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2025.codfw.wmnet with OS bullseye
  • 09:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2025.codfw.wmnet
  • 09:38 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2025.codfw.wmnet
  • 09:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T371742)', diff saved to https://phabricator.wikimedia.org/P67700 and previous config saved to /var/cache/conftool/dbconfig/20240823-093050-ladsgroup.json
  • 09:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 09:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 09:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T371742)', diff saved to https://phabricator.wikimedia.org/P67699 and previous config saved to /var/cache/conftool/dbconfig/20240823-093028-ladsgroup.json
  • 09:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P67698 and previous config saved to /var/cache/conftool/dbconfig/20240823-091521-ladsgroup.json
  • 09:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2179 (T370903)', diff saved to https://phabricator.wikimedia.org/P67697 and previous config saved to /var/cache/conftool/dbconfig/20240823-091251-ladsgroup.json
  • 09:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T370903)', diff saved to https://phabricator.wikimedia.org/P67696 and previous config saved to /var/cache/conftool/dbconfig/20240823-091229-ladsgroup.json
  • 09:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P67695 and previous config saved to /var/cache/conftool/dbconfig/20240823-090014-ladsgroup.json
  • 08:59 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
  • 08:59 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1005.eqiad.wmnet with OS bookworm
  • 08:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P67694 and previous config saved to /var/cache/conftool/dbconfig/20240823-085722-ladsgroup.json
  • 08:54 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
  • 08:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T371742)', diff saved to https://phabricator.wikimedia.org/P67693 and previous config saved to /var/cache/conftool/dbconfig/20240823-084506-ladsgroup.json
  • 08:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P67692 and previous config saved to /var/cache/conftool/dbconfig/20240823-084214-ladsgroup.json
  • 08:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T370903)', diff saved to https://phabricator.wikimedia.org/P67691 and previous config saved to /var/cache/conftool/dbconfig/20240823-082707-ladsgroup.json
  • 08:17 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 08:08 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
  • 07:58 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
  • 07:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T370903)', diff saved to https://phabricator.wikimedia.org/P67690 and previous config saved to /var/cache/conftool/dbconfig/20240823-075415-ladsgroup.json
  • 07:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 07:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 07:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T370903)', diff saved to https://phabricator.wikimedia.org/P67689 and previous config saved to /var/cache/conftool/dbconfig/20240823-075353-ladsgroup.json
  • 07:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P67688 and previous config saved to /var/cache/conftool/dbconfig/20240823-073846-ladsgroup.json
  • 07:27 godog: start prometheus1006 bookworm upgrade - T326657
  • 07:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P67687 and previous config saved to /var/cache/conftool/dbconfig/20240823-072339-ladsgroup.json
  • 07:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T370903)', diff saved to https://phabricator.wikimedia.org/P67686 and previous config saved to /var/cache/conftool/dbconfig/20240823-070832-ladsgroup.json
  • 06:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T371742)', diff saved to https://phabricator.wikimedia.org/P67685 and previous config saved to /var/cache/conftool/dbconfig/20240823-065819-ladsgroup.json
  • 06:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 06:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 06:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T371742)', diff saved to https://phabricator.wikimedia.org/P67684 and previous config saved to /var/cache/conftool/dbconfig/20240823-065756-ladsgroup.json
  • 06:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P67683 and previous config saved to /var/cache/conftool/dbconfig/20240823-064249-ladsgroup.json
  • 06:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T370903)', diff saved to https://phabricator.wikimedia.org/P67682 and previous config saved to /var/cache/conftool/dbconfig/20240823-063539-ladsgroup.json
  • 06:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T370903)', diff saved to https://phabricator.wikimedia.org/P67681 and previous config saved to /var/cache/conftool/dbconfig/20240823-063502-ladsgroup.json
  • 06:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P67680 and previous config saved to /var/cache/conftool/dbconfig/20240823-062742-ladsgroup.json
  • 06:20 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set cephosd1005to failed in Netbox - ayounsi@cumin1002"
  • 06:19 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set cephosd1005to failed in Netbox - ayounsi@cumin1002"
  • 06:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P67679 and previous config saved to /var/cache/conftool/dbconfig/20240823-061954-ladsgroup.json
  • 06:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T371742)', diff saved to https://phabricator.wikimedia.org/P67678 and previous config saved to /var/cache/conftool/dbconfig/20240823-061235-ladsgroup.json
  • 06:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P67677 and previous config saved to /var/cache/conftool/dbconfig/20240823-060447-ladsgroup.json
  • 05:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T370903)', diff saved to https://phabricator.wikimedia.org/P67676 and previous config saved to /var/cache/conftool/dbconfig/20240823-054940-ladsgroup.json
  • 05:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T370903)', diff saved to https://phabricator.wikimedia.org/P67675 and previous config saved to /var/cache/conftool/dbconfig/20240823-051718-ladsgroup.json
  • 05:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 05:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 04:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 04:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 04:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T370903)', diff saved to https://phabricator.wikimedia.org/P67674 and previous config saved to /var/cache/conftool/dbconfig/20240823-044132-ladsgroup.json
  • 04:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P67673 and previous config saved to /var/cache/conftool/dbconfig/20240823-042625-ladsgroup.json
  • 04:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T371742)', diff saved to https://phabricator.wikimedia.org/P67672 and previous config saved to /var/cache/conftool/dbconfig/20240823-042531-ladsgroup.json
  • 04:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 04:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 04:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 04:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 04:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T371742)', diff saved to https://phabricator.wikimedia.org/P67671 and previous config saved to /var/cache/conftool/dbconfig/20240823-042454-ladsgroup.json
  • 04:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P67670 and previous config saved to /var/cache/conftool/dbconfig/20240823-041118-ladsgroup.json
  • 04:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P67669 and previous config saved to /var/cache/conftool/dbconfig/20240823-040947-ladsgroup.json
  • 03:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T370903)', diff saved to https://phabricator.wikimedia.org/P67668 and previous config saved to /var/cache/conftool/dbconfig/20240823-035611-ladsgroup.json
  • 03:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P67667 and previous config saved to /var/cache/conftool/dbconfig/20240823-035439-ladsgroup.json
  • 03:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T371742)', diff saved to https://phabricator.wikimedia.org/P67666 and previous config saved to /var/cache/conftool/dbconfig/20240823-033932-ladsgroup.json
  • 03:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2137 (T370903)', diff saved to https://phabricator.wikimedia.org/P67665 and previous config saved to /var/cache/conftool/dbconfig/20240823-032642-ladsgroup.json
  • 03:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 03:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 03:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T370903)', diff saved to https://phabricator.wikimedia.org/P67664 and previous config saved to /var/cache/conftool/dbconfig/20240823-032620-ladsgroup.json
  • 03:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P67663 and previous config saved to /var/cache/conftool/dbconfig/20240823-031113-ladsgroup.json
  • 02:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P67662 and previous config saved to /var/cache/conftool/dbconfig/20240823-025605-ladsgroup.json
  • 02:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T370903)', diff saved to https://phabricator.wikimedia.org/P67661 and previous config saved to /var/cache/conftool/dbconfig/20240823-024058-ladsgroup.json
  • 02:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2136 (T370903)', diff saved to https://phabricator.wikimedia.org/P67660 and previous config saved to /var/cache/conftool/dbconfig/20240823-021231-ladsgroup.json
  • 02:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 02:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T371742)', diff saved to https://phabricator.wikimedia.org/P67659 and previous config saved to /var/cache/conftool/dbconfig/20240823-015417-ladsgroup.json
  • 01:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 01:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 01:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T370903)', diff saved to https://phabricator.wikimedia.org/P67658 and previous config saved to /var/cache/conftool/dbconfig/20240823-014706-ladsgroup.json
  • 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P67657 and previous config saved to /var/cache/conftool/dbconfig/20240823-013158-ladsgroup.json
  • 01:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P67656 and previous config saved to /var/cache/conftool/dbconfig/20240823-011651-ladsgroup.json
  • 01:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T370903)', diff saved to https://phabricator.wikimedia.org/P67655 and previous config saved to /var/cache/conftool/dbconfig/20240823-010144-ladsgroup.json
  • 00:49 krinkle@deploy1003: Finished deploy [integration/docroot@da4dac4]: (no justification provided) (duration: 00m 06s)
  • 00:49 krinkle@deploy1003: Started deploy [integration/docroot@da4dac4]: (no justification provided)
  • 00:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T370903)', diff saved to https://phabricator.wikimedia.org/P67653 and previous config saved to /var/cache/conftool/dbconfig/20240823-003815-ladsgroup.json
  • 00:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 00:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 00:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T370903)', diff saved to https://phabricator.wikimedia.org/P67652 and previous config saved to /var/cache/conftool/dbconfig/20240823-003753-ladsgroup.json
  • 00:28 andrewbogott: rebooting puppetserver1003.eqiad.wmnet from mgmt console; It's unresponsive and causing puppet errors on clients.
  • 00:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P67651 and previous config saved to /var/cache/conftool/dbconfig/20240823-002245-ladsgroup.json
  • 00:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 00:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 00:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T371742)', diff saved to https://phabricator.wikimedia.org/P67650 and previous config saved to /var/cache/conftool/dbconfig/20240823-001219-ladsgroup.json
  • 00:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P67649 and previous config saved to /var/cache/conftool/dbconfig/20240823-000738-ladsgroup.json

2024-08-22

  • 23:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P67648 and previous config saved to /var/cache/conftool/dbconfig/20240822-235711-ladsgroup.json
  • 23:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T370903)', diff saved to https://phabricator.wikimedia.org/P67647 and previous config saved to /var/cache/conftool/dbconfig/20240822-235231-ladsgroup.json
  • 23:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P67646 and previous config saved to /var/cache/conftool/dbconfig/20240822-234203-ladsgroup.json
  • 23:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T371742)', diff saved to https://phabricator.wikimedia.org/P67645 and previous config saved to /var/cache/conftool/dbconfig/20240822-232656-ladsgroup.json
  • 22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T370903)', diff saved to https://phabricator.wikimedia.org/P67644 and previous config saved to /var/cache/conftool/dbconfig/20240822-224921-ladsgroup.json
  • 22:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T370903)', diff saved to https://phabricator.wikimedia.org/P67643 and previous config saved to /var/cache/conftool/dbconfig/20240822-224859-ladsgroup.json
  • 22:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P67642 and previous config saved to /var/cache/conftool/dbconfig/20240822-223351-ladsgroup.json
  • 22:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P67641 and previous config saved to /var/cache/conftool/dbconfig/20240822-221844-ladsgroup.json
  • 22:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T370903)', diff saved to https://phabricator.wikimedia.org/P67640 and previous config saved to /var/cache/conftool/dbconfig/20240822-220337-ladsgroup.json
  • 21:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2127 (T371742)', diff saved to https://phabricator.wikimedia.org/P67639 and previous config saved to /var/cache/conftool/dbconfig/20240822-213909-ladsgroup.json
  • 21:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 21:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T370903)', diff saved to https://phabricator.wikimedia.org/P67638 and previous config saved to /var/cache/conftool/dbconfig/20240822-213406-ladsgroup.json
  • 21:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 21:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 21:32 brennen@deploy1003: Finished scap sync-world: Backport for Turn on Parsoid read views for cswikivoyage and rowikivoyage (T371353) (duration: 09m 36s)
  • 21:28 brennen@deploy1003: brennen, cscott: Continuing with sync
  • 21:27 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 21:26 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 21:25 brennen@deploy1003: brennen, cscott: Backport for Turn on Parsoid read views for cswikivoyage and rowikivoyage (T371353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:22 brennen@deploy1003: Started scap sync-world: Backport for Turn on Parsoid read views for cswikivoyage and rowikivoyage (T371353)
  • 21:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 21:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 21:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T370903)', diff saved to https://phabricator.wikimedia.org/P67637 and previous config saved to /var/cache/conftool/dbconfig/20240822-210025-ladsgroup.json
  • 20:47 mutante: dzahn@cumin2002 conftool action : set/pooled=no; selector: name=ml-serve2002.codfw.wmnet T365291
  • 20:46 dzahn@cumin2002: conftool action : set/pooled=no; selector: name=ml-serve2002.codfw.wmnet
  • 20:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P67636 and previous config saved to /var/cache/conftool/dbconfig/20240822-204518-ladsgroup.json
  • 20:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P67635 and previous config saved to /var/cache/conftool/dbconfig/20240822-203010-ladsgroup.json
  • 20:26 cdanis@deploy1003: Finished scap sync-world: Backport for Revert "Invert logic on empty talk page" (T373100) (duration: 07m 16s)
  • 20:21 cdanis@deploy1003: matmarex, cdanis: Continuing with sync
  • 20:21 cdanis@deploy1003: matmarex, cdanis: Backport for Revert "Invert logic on empty talk page" (T373100) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:19 cdanis@deploy1003: Started scap sync-world: Backport for Revert "Invert logic on empty talk page" (T373100)
  • 20:19 swfrench-wmf: imported wikidiff2_1.14.1-2+wmf11u2 into component/php81 - T372507
  • 20:18 swfrench-wmf: imported php-wmerrors_2.0.0-1+wmf11u2 into component/php81 - T372507
  • 20:17 swfrench-wmf: imported php-luasandbox_4.1.2-1+wmf11u2 into component/php81 - T372507
  • 20:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T370903)', diff saved to https://phabricator.wikimedia.org/P67634 and previous config saved to /var/cache/conftool/dbconfig/20240822-201503-ladsgroup.json
  • 20:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 20:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 19:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T370903)', diff saved to https://phabricator.wikimedia.org/P67633 and previous config saved to /var/cache/conftool/dbconfig/20240822-194830-ladsgroup.json
  • 19:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 19:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 19:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P67632 and previous config saved to /var/cache/conftool/dbconfig/20240822-194808-ladsgroup.json
  • 19:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P67631 and previous config saved to /var/cache/conftool/dbconfig/20240822-193301-ladsgroup.json
  • 19:31 ryankemper: T364368 Pooled wdqs2024 (its data transfer has completed successfully)
  • 19:30 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: cluster=wdqs-scholarly
  • 19:18 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs2023.codfw.wmnet -> wdqs2024.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 19:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P67630 and previous config saved to /var/cache/conftool/dbconfig/20240822-191754-ladsgroup.json
  • 19:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P67629 and previous config saved to /var/cache/conftool/dbconfig/20240822-190247-ladsgroup.json
  • 19:01 ryankemper: T364368 Pooled all wdqs main/scholarly hosts except wdqs2024, which won't be ready for another hour
  • 19:01 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: cluster=wdqs-scholarly
  • 18:57 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: cluster=wdqs-main
  • 18:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 18:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 18:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T371742)', diff saved to https://phabricator.wikimedia.org/P67628 and previous config saved to /var/cache/conftool/dbconfig/20240822-184628-ladsgroup.json
  • 18:36 sukhe@cumin1002: END (ERROR) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=97) Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not P{cp4044* or cp4052*} and A:cp for 9.2.5-1wm2
  • 18:36 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs2023.codfw.wmnet -> wdqs2024.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 18:36 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not P{cp4044* or cp4052*} and A:cp for 9.2.5-1wm2
  • 18:36 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not P{cp40[37-40]* or cp4044* or cp4052*} and A:cp for 9.2.5-1wm2
  • 18:35 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 11s)
  • 18:35 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
  • 18:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P67627 and previous config saved to /var/cache/conftool/dbconfig/20240822-183120-ladsgroup.json
  • 18:19 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on wdqs2024.codfw.wmnet with reason: needs a data transfer
  • 18:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2024.codfw.wmnet with reason: needs a data transfer
  • 18:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P67626 and previous config saved to /var/cache/conftool/dbconfig/20240822-181613-ladsgroup.json
  • 18:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P67625 and previous config saved to /var/cache/conftool/dbconfig/20240822-180230-ladsgroup.json
  • 18:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 18:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 18:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T370903)', diff saved to https://phabricator.wikimedia.org/P67624 and previous config saved to /var/cache/conftool/dbconfig/20240822-180208-ladsgroup.json
  • 18:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T371742)', diff saved to https://phabricator.wikimedia.org/P67623 and previous config saved to /var/cache/conftool/dbconfig/20240822-180106-ladsgroup.json
  • 17:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P67622 and previous config saved to /var/cache/conftool/dbconfig/20240822-174701-ladsgroup.json
  • 17:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4041.ulsfo.wmnet [reason: cookbook had failed as Puppet was disabled so pooling manually]
  • 17:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P67621 and previous config saved to /var/cache/conftool/dbconfig/20240822-173153-ladsgroup.json
  • 17:24 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not P{cp40[37-40]* or cp4044* or cp4052*} and A:cp for 9.2.5-1wm2
  • 17:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1223 (T371742)', diff saved to https://phabricator.wikimedia.org/P67620 and previous config saved to /var/cache/conftool/dbconfig/20240822-172404-ladsgroup.json
  • 17:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 17:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 17:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T371742)', diff saved to https://phabricator.wikimedia.org/P67619 and previous config saved to /var/cache/conftool/dbconfig/20240822-172342-ladsgroup.json
  • 17:20 sukhe: sudo cumin -b11 "A:cp" "run-puppet-agent" rolling out CR 1064797: T370294
  • 17:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T370903)', diff saved to https://phabricator.wikimedia.org/P67618 and previous config saved to /var/cache/conftool/dbconfig/20240822-171646-ladsgroup.json
  • 17:11 cdanis: 💙cdanis@cumin1002.eqiad.wmnet ~ 🕐☕ sudo ipmitool -I lanplus -H "puppetserver1002.mgmt.eqiad.wmnet" -U root -E chassis power cycle
  • 17:10 topranks: removing no-longer-required vlans from ssw1-a1-codfw after lvs move T370927
  • 17:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P67617 and previous config saved to /var/cache/conftool/dbconfig/20240822-170835-ladsgroup.json
  • 16:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P67616 and previous config saved to /var/cache/conftool/dbconfig/20240822-165328-ladsgroup.json
  • 16:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T370903)', diff saved to https://phabricator.wikimedia.org/P67615 and previous config saved to /var/cache/conftool/dbconfig/20240822-164505-ladsgroup.json
  • 16:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T370903)', diff saved to https://phabricator.wikimedia.org/P67614 and previous config saved to /var/cache/conftool/dbconfig/20240822-164443-ladsgroup.json
  • 16:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T371742)', diff saved to https://phabricator.wikimedia.org/P67613 and previous config saved to /var/cache/conftool/dbconfig/20240822-163819-ladsgroup.json
  • 16:35 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2013.codfw.wmnet with OS bullseye
  • 16:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P67612 and previous config saved to /var/cache/conftool/dbconfig/20240822-162936-ladsgroup.json
  • 16:27 ChrisDobbins901_: cdobbins@cumin1002:~$ sudo cumin -b11 'A:cp' 'run-puppet-agent --enable "merging CR 1064782"'
  • 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P67611 and previous config saved to /var/cache/conftool/dbconfig/20240822-161429-ladsgroup.json
  • 16:11 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: host reimage
  • 16:09 sukhe@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=1) Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not P{cp4044* or cp4052*} and A:cp for 9.2.5-1wm2
  • 16:07 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: host reimage
  • 16:05 ChrisDobbins901_: cdobbins@cumin1002:~$ sudo cumin 'A:cp' 'disable-puppet' 'merging CR 1064782'
  • 16:03 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
  • 16:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 16:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T371742)', diff saved to https://phabricator.wikimedia.org/P67610 and previous config saved to /var/cache/conftool/dbconfig/20240822-160131-ladsgroup.json
  • 16:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 16:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T371742)', diff saved to https://phabricator.wikimedia.org/P67609 and previous config saved to /var/cache/conftool/dbconfig/20240822-160052-ladsgroup.json
  • 15:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T370903)', diff saved to https://phabricator.wikimedia.org/P67608 and previous config saved to /var/cache/conftool/dbconfig/20240822-155921-ladsgroup.json
  • 15:50 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host lvs2013.codfw.wmnet with OS bullseye
  • 15:48 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1005.eqiad.wmnet with OS bookworm
  • 15:46 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lvs2013.codfw.wmnet on all recursors
  • 15:46 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache lvs2013.codfw.wmnet on all recursors
  • 15:46 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lvs2014.codfw.wmnet on all recursors
  • 15:46 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache lvs2014.codfw.wmnet on all recursors
  • 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P67607 and previous config saved to /var/cache/conftool/dbconfig/20240822-154544-ladsgroup.json
  • 15:45 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:45 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for lvs2013 - cmooney@cumin1002"
  • 15:45 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for lvs2013 - cmooney@cumin1002"
  • 15:41 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 15:37 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not P{cp4044* or cp4052*} and A:cp for 9.2.5-1wm2
  • 15:36 topranks: add vlans to trunk port on lsw1-c2-codfw facing new lvs2013 link T370927
  • 15:36 sukhe: upgrading A:cp-ulsfo to ATS 9.2.5: T339134
  • 15:31 topranks: disabling BGP on cr1-codfw and cr2-codfw towards lvs2013 in advance of host move to new switch T370927
  • 15:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P67606 and previous config saved to /var/cache/conftool/dbconfig/20240822-153037-ladsgroup.json
  • 15:30 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: move lvs2013 from asw to lsw
  • 15:30 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: move lvs2013 from asw to lsw
  • 15:29 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lsw1-c2-codfw.mgmt with reason: move lvs2013 from asw to lsw
  • 15:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lsw1-c2-codfw.mgmt with reason: move lvs2013 from asw to lsw
  • 15:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T370903)', diff saved to https://phabricator.wikimedia.org/P67605 and previous config saved to /var/cache/conftool/dbconfig/20240822-152620-ladsgroup.json
  • 15:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 15:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 15:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T370903)', diff saved to https://phabricator.wikimedia.org/P67604 and previous config saved to /var/cache/conftool/dbconfig/20240822-152558-ladsgroup.json
  • 15:22 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kafka-main2001.codfw.wmnet with reason: Hardware refresh
  • 15:22 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on kafka-main2001.codfw.wmnet with reason: Hardware refresh
  • 15:21 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kafka-main2006.codfw.wmnet
  • 15:21 jayme@cumin1002: START - Cookbook sre.hosts.remove-downtime for kafka-main2006.codfw.wmnet
  • 15:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T371742)', diff saved to https://phabricator.wikimedia.org/P67603 and previous config saved to /var/cache/conftool/dbconfig/20240822-151530-ladsgroup.json
  • 15:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P67602 and previous config saved to /var/cache/conftool/dbconfig/20240822-151050-ladsgroup.json
  • 15:01 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
  • 14:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P67601 and previous config saved to /var/cache/conftool/dbconfig/20240822-145543-ladsgroup.json
  • 14:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 14:47 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 14:46 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 14:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 14:40 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T370903)', diff saved to https://phabricator.wikimedia.org/P67600 and previous config saved to /var/cache/conftool/dbconfig/20240822-144036-ladsgroup.json
  • 14:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T371742)', diff saved to https://phabricator.wikimedia.org/P67599 and previous config saved to /var/cache/conftool/dbconfig/20240822-143655-ladsgroup.json
  • 14:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 14:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 14:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T371742)', diff saved to https://phabricator.wikimedia.org/P67598 and previous config saved to /var/cache/conftool/dbconfig/20240822-143633-ladsgroup.json
  • 14:36 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 14:32 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 14:31 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 14:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 14:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P67597 and previous config saved to /var/cache/conftool/dbconfig/20240822-142126-ladsgroup.json
  • 14:19 MichaelG_WMF: T372333, with I431d2a checked out, running mwscript /home/migr/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=dewiki --dry-run --search-index --db-table
  • 13:59 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 13:58 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 13:58 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P67593 and previous config saved to /var/cache/conftool/dbconfig/20240822-135731-ladsgroup.json
  • 13:57 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:57 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:55 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1296.eqiad.wmnet with OS bullseye
  • 13:54 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 13:53 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 13:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T371742)', diff saved to https://phabricator.wikimedia.org/P67592 and previous config saved to /var/cache/conftool/dbconfig/20240822-135111-ladsgroup.json
  • 13:50 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:48 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:46 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:45 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 13:45 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 13:44 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:43 TheresNoTime: UTC afternoon backport window closed
  • 13:42 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:42 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:42 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:42 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P67591 and previous config saved to /var/cache/conftool/dbconfig/20240822-134224-ladsgroup.json
  • 13:41 jayme@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:41 jayme@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:38 samtar@deploy1003: Finished scap sync-world: Backport for knwikisource : Create flood flag and add file importer right to Admin user group (T373073) (duration: 08m 20s)
  • 13:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1296.eqiad.wmnet with reason: host reimage
  • 13:34 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1296.eqiad.wmnet with reason: host reimage
  • 13:34 samtar@deploy1003: anzx, samtar: Continuing with sync
  • 13:32 samtar@deploy1003: anzx, samtar: Backport for knwikisource : Create flood flag and add file importer right to Admin user group (T373073) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:30 samtar@deploy1003: Started scap sync-world: Backport for knwikisource : Create flood flag and add file importer right to Admin user group (T373073)
  • 13:27 samtar@deploy1003: Finished scap sync-world: Backport for Use shellbox-video for videoscaling on group2 (T356241) (duration: 09m 10s)
  • 13:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T370903)', diff saved to https://phabricator.wikimedia.org/P67590 and previous config saved to /var/cache/conftool/dbconfig/20240822-132717-ladsgroup.json
  • 13:23 samtar@deploy1003: hnowlan, samtar: Continuing with sync
  • 13:23 samtar@deploy1003: hnowlan, samtar: Backport for Use shellbox-video for videoscaling on group2 (T356241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:18 samtar@deploy1003: Started scap sync-world: Backport for Use shellbox-video for videoscaling on group2 (T356241)
  • 13:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1296.eqiad.wmnet with OS bullseye
  • 13:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1296.eqiad.wmnet with OS bullseye
  • 13:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1296.eqiad.wmnet with OS bullseye
  • 13:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1298.eqiad.wmnet with OS bullseye
  • 13:15 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 13:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T371742)', diff saved to https://phabricator.wikimedia.org/P67589 and previous config saved to /var/cache/conftool/dbconfig/20240822-131425-ladsgroup.json
  • 13:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 13:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 13:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T371742)', diff saved to https://phabricator.wikimedia.org/P67588 and previous config saved to /var/cache/conftool/dbconfig/20240822-131402-ladsgroup.json
  • 13:05 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1005.eqiad.wmnet with OS bookworm
  • 12:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P67587 and previous config saved to /var/cache/conftool/dbconfig/20240822-125855-ladsgroup.json
  • 12:55 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 12:54 cdanis@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 12:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P67586 and previous config saved to /var/cache/conftool/dbconfig/20240822-124348-ladsgroup.json
  • 12:37 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 12:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T371742)', diff saved to https://phabricator.wikimedia.org/P67584 and previous config saved to /var/cache/conftool/dbconfig/20240822-122841-ladsgroup.json
  • 12:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2011.codfw.wmnet
  • 12:22 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2011.codfw.wmnet
  • 12:18 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
  • 12:17 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 12:17 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1005.eqiad.wmnet with OS bookworm
  • 12:15 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
  • 12:12 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1005.eqiad.wmnet with OS bookworm
  • 12:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2011.codfw.wmnet with OS bullseye
  • 12:02 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:02 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:02 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:02 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:02 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:02 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:02 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:02 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:02 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:02 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 12:02 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 12:02 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 12:01 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:01 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T370903)', diff saved to https://phabricator.wikimedia.org/P67583 and previous config saved to /var/cache/conftool/dbconfig/20240822-120122-ladsgroup.json
  • 12:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 12:01 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:01 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 12:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T370903)', diff saved to https://phabricator.wikimedia.org/P67582 and previous config saved to /var/cache/conftool/dbconfig/20240822-120053-ladsgroup.json
  • 12:00 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:00 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:00 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:53 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:52 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T371742)', diff saved to https://phabricator.wikimedia.org/P67581 and previous config saved to /var/cache/conftool/dbconfig/20240822-115108-ladsgroup.json
  • 11:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 11:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 11:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T371742)', diff saved to https://phabricator.wikimedia.org/P67580 and previous config saved to /var/cache/conftool/dbconfig/20240822-115047-ladsgroup.json
  • 11:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2011.codfw.wmnet with reason: host reimage
  • 11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P67579 and previous config saved to /var/cache/conftool/dbconfig/20240822-114546-ladsgroup.json
  • 11:45 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2011.codfw.wmnet with reason: host reimage
  • 11:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1298.eqiad.wmnet with reason: host reimage
  • 11:42 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1298.eqiad.wmnet with reason: host reimage
  • 11:38 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
  • 11:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P67578 and previous config saved to /var/cache/conftool/dbconfig/20240822-113540-ladsgroup.json
  • 11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P67577 and previous config saved to /var/cache/conftool/dbconfig/20240822-113038-ladsgroup.json
  • 11:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f32c7881dc0>
  • 11:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2011
  • 11:25 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2011
  • 11:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2011.codfw.wmnet 64.0.192.10.in-addr.arpa 4.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 11:25 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2011.codfw.wmnet 64.0.192.10.in-addr.arpa 4.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 11:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2011 - cgoubert@cumin1002"
  • 11:24 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2011 - cgoubert@cumin1002"
  • 11:24 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1298.eqiad.wmnet with OS bullseye
  • 11:22 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1296.eqiad.wmnet with OS bullseye
  • 11:21 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 11:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P67576 and previous config saved to /var/cache/conftool/dbconfig/20240822-112033-ladsgroup.json
  • 11:19 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f32c7881dc0>
  • 11:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2011.codfw.wmnet with OS bullseye
  • 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T370903)', diff saved to https://phabricator.wikimedia.org/P67575 and previous config saved to /var/cache/conftool/dbconfig/20240822-111531-ladsgroup.json
  • 11:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2011.codfw.wmnet
  • 11:14 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2011.codfw.wmnet
  • 11:12 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1005.eqiad.wmnet with OS bookworm
  • 11:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T371742)', diff saved to https://phabricator.wikimedia.org/P67574 and previous config saved to /var/cache/conftool/dbconfig/20240822-110526-ladsgroup.json
  • 10:49 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
  • 10:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T370903)', diff saved to https://phabricator.wikimedia.org/P67573 and previous config saved to /var/cache/conftool/dbconfig/20240822-104314-ladsgroup.json
  • 10:43 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 10:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 10:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T370903)', diff saved to https://phabricator.wikimedia.org/P67572 and previous config saved to /var/cache/conftool/dbconfig/20240822-104252-ladsgroup.json
  • 10:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P67571 and previous config saved to /var/cache/conftool/dbconfig/20240822-102744-ladsgroup.json
  • 10:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1157 (T371742)', diff saved to https://phabricator.wikimedia.org/P67570 and previous config saved to /var/cache/conftool/dbconfig/20240822-102613-ladsgroup.json
  • 10:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:24 cgoubert@deploy1003: Finished scap sync-world: mediawiki: Get rid of obsolete extract2.php redirect - 1064723 - T373048 (duration: 05m 43s)
  • 10:20 cgoubert@deploy1003: cgoubert: Continuing with sync
  • 10:19 cgoubert@deploy1003: cgoubert: mediawiki: Get rid of obsolete extract2.php redirect - 1064723 - T373048 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:18 XioNoX: cr1-eqiad> request vmhost power-on other-routing-engine - T372781
  • 10:18 cgoubert@deploy1003: Started scap sync-world: mediawiki: Get rid of obsolete extract2.php redirect - 1064723 - T373048
  • 10:16 XioNoX: cr1-eqiad> request vmhost power-off other-routing-engine - T372781
  • 10:15 XioNoX: cr1-eqiad> request vmhost snapshot recovery partition re0 - T372781
  • 10:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P67569 and previous config saved to /var/cache/conftool/dbconfig/20240822-101237-ladsgroup.json
  • 10:11 XioNoX: cr1-eqiad> request vmhost snapshot recovery re0 - T372781
  • 10:10 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kafka-main2006.codfw.wmnet with reason: Hardware refresh
  • 10:09 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kafka-main2006.codfw.wmnet with reason: Hardware refresh
  • 09:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T370903)', diff saved to https://phabricator.wikimedia.org/P67568 and previous config saved to /var/cache/conftool/dbconfig/20240822-095730-ladsgroup.json
  • 09:53 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
  • 09:49 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
  • 09:41 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kafka-main2001.codfw.wmnet with reason: Hardware refresh
  • 09:41 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kafka-main2001.codfw.wmnet with reason: Hardware refresh
  • 09:34 godog: start prometheus2006 bookworm upgrade - T326657
  • 09:32 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 09:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T370903)', diff saved to https://phabricator.wikimedia.org/P67567 and previous config saved to /var/cache/conftool/dbconfig/20240822-092631-ladsgroup.json
  • 09:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 09:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 09:24 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 09:20 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 09:13 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:57 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp-test2002.wikimedia.org
  • 08:57 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:57 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
  • 08:57 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
  • 08:54 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
  • 08:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 08:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 08:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 08:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 08:49 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts idp-test2002.wikimedia.org
  • 08:48 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp-test1002.wikimedia.org
  • 08:48 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:48 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
  • 08:47 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
  • 08:44 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
  • 08:39 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts idp-test1002.wikimedia.org
  • 08:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T371742)', diff saved to https://phabricator.wikimedia.org/P67566 and previous config saved to /var/cache/conftool/dbconfig/20240822-083706-ladsgroup.json
  • 08:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P67565 and previous config saved to /var/cache/conftool/dbconfig/20240822-082158-ladsgroup.json
  • 08:16 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.19 refs T366964
  • 08:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P67564 and previous config saved to /var/cache/conftool/dbconfig/20240822-080651-ladsgroup.json
  • 07:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T371742)', diff saved to https://phabricator.wikimedia.org/P67563 and previous config saved to /var/cache/conftool/dbconfig/20240822-075144-ladsgroup.json
  • 07:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T371742)', diff saved to https://phabricator.wikimedia.org/P67562 and previous config saved to /var/cache/conftool/dbconfig/20240822-072836-ladsgroup.json
  • 07:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 07:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 07:25 kartik@deploy1003: Finished scap sync-world: Backport for Enable Content/Section translation on WPs without MT (T361582) (duration: 07m 51s)
  • 07:20 eileen: civicrm upgraded from 7dc4401a to 975fc66e
  • 07:20 kartik@deploy1003: kartik: Continuing with sync
  • 07:19 kartik@deploy1003: kartik: Backport for Enable Content/Section translation on WPs without MT (T361582) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:17 kartik@deploy1003: Started scap sync-world: Backport for Enable Content/Section translation on WPs without MT (T361582)
  • 07:11 kartik@deploy1003: Finished scap sync-world: Backport for Content Translation: Revert MT threshold to default for Portuguese Wikipedia (T356356) (duration: 08m 01s)
  • 07:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 07:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 07:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T371742)', diff saved to https://phabricator.wikimedia.org/P67561 and previous config saved to /var/cache/conftool/dbconfig/20240822-070708-ladsgroup.json
  • 07:06 kartik@deploy1003: kartik: Continuing with sync
  • 07:05 kartik@deploy1003: kartik: Backport for Content Translation: Revert MT threshold to default for Portuguese Wikipedia (T356356) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:03 kartik@deploy1003: Started scap sync-world: Backport for Content Translation: Revert MT threshold to default for Portuguese Wikipedia (T356356)
  • 06:57 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4637
  • 06:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 4637
  • 06:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P67560 and previous config saved to /var/cache/conftool/dbconfig/20240822-065201-ladsgroup.json
  • 06:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 40317
  • 06:41 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 40317
  • 06:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P67559 and previous config saved to /var/cache/conftool/dbconfig/20240822-063653-ladsgroup.json
  • 06:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T371742)', diff saved to https://phabricator.wikimedia.org/P67558 and previous config saved to /var/cache/conftool/dbconfig/20240822-062146-ladsgroup.json
  • 06:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T371742)', diff saved to https://phabricator.wikimedia.org/P67557 and previous config saved to /var/cache/conftool/dbconfig/20240822-061202-ladsgroup.json
  • 06:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 06:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 06:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T371742)', diff saved to https://phabricator.wikimedia.org/P67556 and previous config saved to /var/cache/conftool/dbconfig/20240822-061140-ladsgroup.json
  • 05:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P67555 and previous config saved to /var/cache/conftool/dbconfig/20240822-055633-ladsgroup.json
  • 05:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P67554 and previous config saved to /var/cache/conftool/dbconfig/20240822-054125-ladsgroup.json
  • 05:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T371742)', diff saved to https://phabricator.wikimedia.org/P67553 and previous config saved to /var/cache/conftool/dbconfig/20240822-052618-ladsgroup.json
  • 05:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T371742)', diff saved to https://phabricator.wikimedia.org/P67552 and previous config saved to /var/cache/conftool/dbconfig/20240822-051547-ladsgroup.json
  • 05:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 05:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 05:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T371742)', diff saved to https://phabricator.wikimedia.org/P67551 and previous config saved to /var/cache/conftool/dbconfig/20240822-051536-ladsgroup.json
  • 05:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P67550 and previous config saved to /var/cache/conftool/dbconfig/20240822-050027-ladsgroup.json
  • 04:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P67549 and previous config saved to /var/cache/conftool/dbconfig/20240822-044520-ladsgroup.json
  • 04:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T371742)', diff saved to https://phabricator.wikimedia.org/P67548 and previous config saved to /var/cache/conftool/dbconfig/20240822-043013-ladsgroup.json
  • 04:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T371742)', diff saved to https://phabricator.wikimedia.org/P67547 and previous config saved to /var/cache/conftool/dbconfig/20240822-040551-ladsgroup.json
  • 04:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 04:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 04:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T371742)', diff saved to https://phabricator.wikimedia.org/P67546 and previous config saved to /var/cache/conftool/dbconfig/20240822-040529-ladsgroup.json
  • 03:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P67545 and previous config saved to /var/cache/conftool/dbconfig/20240822-035022-ladsgroup.json
  • 03:48 eileen: config revision changed from b1b3a1e6 to 69a40997
  • 03:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P67544 and previous config saved to /var/cache/conftool/dbconfig/20240822-033514-ladsgroup.json
  • 03:21 eileen: civicrm upgraded from b27307a9 to 7dc4401a
  • 03:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T371742)', diff saved to https://phabricator.wikimedia.org/P67543 and previous config saved to /var/cache/conftool/dbconfig/20240822-032007-ladsgroup.json
  • 03:10 eileen: civicrm upgraded from 3183c865 to b27307a9
  • 02:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T371742)', diff saved to https://phabricator.wikimedia.org/P67542 and previous config saved to /var/cache/conftool/dbconfig/20240822-025529-ladsgroup.json
  • 02:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 02:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 02:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 02:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 02:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T371742)', diff saved to https://phabricator.wikimedia.org/P67541 and previous config saved to /var/cache/conftool/dbconfig/20240822-025451-ladsgroup.json
  • 02:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P67540 and previous config saved to /var/cache/conftool/dbconfig/20240822-023944-ladsgroup.json
  • 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P67539 and previous config saved to /var/cache/conftool/dbconfig/20240822-022437-ladsgroup.json
  • 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T371742)', diff saved to https://phabricator.wikimedia.org/P67538 and previous config saved to /var/cache/conftool/dbconfig/20240822-020930-ladsgroup.json
  • 01:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T371742)', diff saved to https://phabricator.wikimedia.org/P67537 and previous config saved to /var/cache/conftool/dbconfig/20240822-014441-ladsgroup.json
  • 01:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T371742)', diff saved to https://phabricator.wikimedia.org/P67536 and previous config saved to /var/cache/conftool/dbconfig/20240822-014419-ladsgroup.json
  • 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2043.codfw.wmnet with OS bookworm
  • 01:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P67535 and previous config saved to /var/cache/conftool/dbconfig/20240822-012912-ladsgroup.json
  • 01:26 eileen: civicrm upgraded from ed72cf6c to 3183c865
  • 01:26 eileen: config revision changed from b1b3a1e6 to 69a40997
  • 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2044.codfw.wmnet with OS bookworm
  • 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:21 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2042.codfw.wmnet with OS bookworm
  • 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P67534 and previous config saved to /var/cache/conftool/dbconfig/20240822-011405-ladsgroup.json
  • 01:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2041.codfw.wmnet with OS bookworm
  • 01:10 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2040.codfw.wmnet with OS bookworm
  • 01:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2042.codfw.wmnet with reason: host reimage
  • 01:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2044.codfw.wmnet with reason: host reimage
  • 01:00 eileen: civicrm upgraded from 3b22c823 to ed72cf6c
  • 00:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T371742)', diff saved to https://phabricator.wikimedia.org/P67533 and previous config saved to /var/cache/conftool/dbconfig/20240822-005857-ladsgroup.json
  • 00:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2043.codfw.wmnet with reason: host reimage
  • 00:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2041.codfw.wmnet with reason: host reimage
  • 00:54 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2044.codfw.wmnet with reason: host reimage
  • 00:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2043.codfw.wmnet with reason: host reimage
  • 00:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2042.codfw.wmnet with reason: host reimage
  • 00:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2040.codfw.wmnet with reason: host reimage
  • 00:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2041.codfw.wmnet with reason: host reimage
  • 00:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2040.codfw.wmnet with reason: host reimage
  • 00:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2044.codfw.wmnet with OS bookworm
  • 00:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2043.codfw.wmnet with OS bookworm
  • 00:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2042.codfw.wmnet with OS bookworm
  • 00:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2041.codfw.wmnet with OS bookworm
  • 00:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2040.codfw.wmnet with OS bookworm
  • 00:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2129 (T371742)', diff saved to https://phabricator.wikimedia.org/P67532 and previous config saved to /var/cache/conftool/dbconfig/20240822-003352-ladsgroup.json
  • 00:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 00:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 00:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T371742)', diff saved to https://phabricator.wikimedia.org/P67531 and previous config saved to /var/cache/conftool/dbconfig/20240822-003330-ladsgroup.json
  • 00:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2040.codfw.wmnet with OS bookworm
  • 00:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P67530 and previous config saved to /var/cache/conftool/dbconfig/20240822-001823-ladsgroup.json
  • 00:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2040.codfw.wmnet with OS bookworm
  • 00:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2040.codfw.wmnet with OS bookworm
  • 00:13 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2040.codfw.wmnet with OS bookworm
  • 00:12 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1298.eqiad.wmnet with OS bullseye
  • 00:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1298.eqiad.wmnet with OS bullseye
  • 00:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2044.codfw.wmnet with OS bookworm
  • 00:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2043.codfw.wmnet with OS bookworm
  • 00:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2044.codfw.wmnet with OS bookworm
  • 00:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2042.codfw.wmnet with OS bookworm
  • 00:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2041.codfw.wmnet with OS bookworm
  • 00:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2040.codfw.wmnet with OS bookworm
  • 00:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2043.codfw.wmnet with OS bookworm
  • 00:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2042.codfw.wmnet with OS bookworm
  • 00:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2041.codfw.wmnet with OS bookworm
  • 00:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2040.codfw.wmnet with OS bookworm
  • 00:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1298.eqiad.wmnet with OS bullseye
  • 00:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1298.eqiad.wmnet with OS bullseye
  • 00:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2038.codfw.wmnet with OS bookworm
  • 00:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 00:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P67529 and previous config saved to /var/cache/conftool/dbconfig/20240822-000315-ladsgroup.json
  • 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 00:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1298.eqiad.wmnet with OS bullseye
  • 00:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2039.codfw.wmnet with OS bookworm
  • 00:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"

2024-08-21

  • 23:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2036.codfw.wmnet with OS bookworm
  • 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1298.eqiad.wmnet with OS bullseye
  • 23:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T370903)', diff saved to https://phabricator.wikimedia.org/P67528 and previous config saved to /var/cache/conftool/dbconfig/20240821-235559-ladsgroup.json
  • 23:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2037.codfw.wmnet with OS bookworm
  • 23:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T371742)', diff saved to https://phabricator.wikimedia.org/P67527 and previous config saved to /var/cache/conftool/dbconfig/20240821-234808-ladsgroup.json
  • 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2038.codfw.wmnet with reason: host reimage
  • 23:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2039.codfw.wmnet with reason: host reimage
  • 23:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2036.codfw.wmnet with reason: host reimage
  • 23:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P67526 and previous config saved to /var/cache/conftool/dbconfig/20240821-234051-ladsgroup.json
  • 23:40 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2039.codfw.wmnet with reason: host reimage
  • 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2037.codfw.wmnet with reason: host reimage
  • 23:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2038.codfw.wmnet with reason: host reimage
  • 23:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2036.codfw.wmnet with reason: host reimage
  • 23:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2035.codfw.wmnet with OS bookworm
  • 23:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2037.codfw.wmnet with reason: host reimage
  • 23:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:27 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P67525 and previous config saved to /var/cache/conftool/dbconfig/20240821-232544-ladsgroup.json
  • 23:24 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2039.codfw.wmnet with OS bookworm
  • 23:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T371742)', diff saved to https://phabricator.wikimedia.org/P67524 and previous config saved to /var/cache/conftool/dbconfig/20240821-232341-ladsgroup.json
  • 23:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 23:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2035.codfw.wmnet with reason: host reimage
  • 23:22 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2038.codfw.wmnet with OS bookworm
  • 23:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2037.codfw.wmnet with OS bookworm
  • 23:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2036.codfw.wmnet with OS bookworm
  • 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2035.codfw.wmnet with reason: host reimage
  • 23:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 23:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 23:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T370903)', diff saved to https://phabricator.wikimedia.org/P67523 and previous config saved to /var/cache/conftool/dbconfig/20240821-231037-ladsgroup.json
  • 23:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2213 (T370903)', diff saved to https://phabricator.wikimedia.org/P67522 and previous config saved to /var/cache/conftool/dbconfig/20240821-230600-ladsgroup.json
  • 23:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 23:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 23:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T370903)', diff saved to https://phabricator.wikimedia.org/P67521 and previous config saved to /var/cache/conftool/dbconfig/20240821-230549-ladsgroup.json
  • 22:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 22:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 22:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T371742)', diff saved to https://phabricator.wikimedia.org/P67520 and previous config saved to /var/cache/conftool/dbconfig/20240821-225436-ladsgroup.json
  • 22:52 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2035.codfw.wmnet with OS bookworm
  • 22:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P67519 and previous config saved to /var/cache/conftool/dbconfig/20240821-225042-ladsgroup.json
  • 22:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P67518 and previous config saved to /var/cache/conftool/dbconfig/20240821-223929-ladsgroup.json
  • 22:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P67517 and previous config saved to /var/cache/conftool/dbconfig/20240821-223535-ladsgroup.json
  • 22:30 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-main journal) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs2022.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 22:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P67516 and previous config saved to /var/cache/conftool/dbconfig/20240821-222422-ladsgroup.json
  • 22:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T370903)', diff saved to https://phabricator.wikimedia.org/P67515 and previous config saved to /var/cache/conftool/dbconfig/20240821-222028-ladsgroup.json
  • 22:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T370903)', diff saved to https://phabricator.wikimedia.org/P67514 and previous config saved to /var/cache/conftool/dbconfig/20240821-221450-ladsgroup.json
  • 22:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 22:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 22:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 22:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 22:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T370903)', diff saved to https://phabricator.wikimedia.org/P67512 and previous config saved to /var/cache/conftool/dbconfig/20240821-220947-ladsgroup.json
  • 22:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T371742)', diff saved to https://phabricator.wikimedia.org/P67511 and previous config saved to /var/cache/conftool/dbconfig/20240821-220915-ladsgroup.json
  • 22:09 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs2023.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 21:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T371742)', diff saved to https://phabricator.wikimedia.org/P67510 and previous config saved to /var/cache/conftool/dbconfig/20240821-215537-ladsgroup.json
  • 21:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 21:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 21:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P67509 and previous config saved to /var/cache/conftool/dbconfig/20240821-215440-ladsgroup.json
  • 21:42 amastilovic@deploy1003: Finished deploy [airflow-dags/wmde@109c99e]: (no justification provided) (duration: 00m 03s)
  • 21:42 amastilovic@deploy1003: Started deploy [airflow-dags/wmde@109c99e]: (no justification provided)
  • 21:42 amastilovic@deploy1003: Finished deploy [airflow-dags/search@109c99e]: (no justification provided) (duration: 00m 03s)
  • 21:42 amastilovic@deploy1003: Started deploy [airflow-dags/search@109c99e]: (no justification provided)
  • 21:42 amastilovic@deploy1003: Finished deploy [airflow-dags/analytics_product@1856d12]: (no justification provided) (duration: 00m 03s)
  • 21:41 amastilovic@deploy1003: Started deploy [airflow-dags/analytics_product@1856d12]: (no justification provided)
  • 21:41 amastilovic@deploy1003: Finished deploy [airflow-dags/research@109c99e]: (no justification provided) (duration: 00m 03s)
  • 21:41 amastilovic@deploy1003: Started deploy [airflow-dags/research@109c99e]: (no justification provided)
  • 21:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P67508 and previous config saved to /var/cache/conftool/dbconfig/20240821-213932-ladsgroup.json
  • 21:39 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs2022.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 21:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 21:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 21:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T371742)', diff saved to https://phabricator.wikimedia.org/P67507 and previous config saved to /var/cache/conftool/dbconfig/20240821-213323-ladsgroup.json
  • 21:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs2023.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 21:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T370903)', diff saved to https://phabricator.wikimedia.org/P67506 and previous config saved to /var/cache/conftool/dbconfig/20240821-212425-ladsgroup.json
  • 21:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T370903)', diff saved to https://phabricator.wikimedia.org/P67505 and previous config saved to /var/cache/conftool/dbconfig/20240821-212024-ladsgroup.json
  • 21:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T370903)', diff saved to https://phabricator.wikimedia.org/P67504 and previous config saved to /var/cache/conftool/dbconfig/20240821-212002-ladsgroup.json
  • 21:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P67503 and previous config saved to /var/cache/conftool/dbconfig/20240821-211816-ladsgroup.json
  • 21:11 amastilovic@deploy1003: Finished deploy [airflow-dags/analytics@1856d12]: (no justification provided) (duration: 01m 35s)
  • 21:09 amastilovic@deploy1003: Started deploy [airflow-dags/analytics@1856d12]: (no justification provided)
  • 21:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P67502 and previous config saved to /var/cache/conftool/dbconfig/20240821-210455-ladsgroup.json
  • 21:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P67501 and previous config saved to /var/cache/conftool/dbconfig/20240821-210309-ladsgroup.json
  • 21:00 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 20:58 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 20:58 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 20:57 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 20:57 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:54 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 20:53 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 20:53 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 20:52 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 20:52 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:51 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P67500 and previous config saved to /var/cache/conftool/dbconfig/20240821-204948-ladsgroup.json
  • 20:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T371742)', diff saved to https://phabricator.wikimedia.org/P67499 and previous config saved to /var/cache/conftool/dbconfig/20240821-204802-ladsgroup.json
  • 20:41 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:40 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T371742)', diff saved to https://phabricator.wikimedia.org/P67498 and previous config saved to /var/cache/conftool/dbconfig/20240821-203753-ladsgroup.json
  • 20:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 20:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 20:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T371742)', diff saved to https://phabricator.wikimedia.org/P67497 and previous config saved to /var/cache/conftool/dbconfig/20240821-203731-ladsgroup.json
  • 20:35 cjming: end of UTC late backport window
  • 20:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T370903)', diff saved to https://phabricator.wikimedia.org/P67496 and previous config saved to /var/cache/conftool/dbconfig/20240821-203442-ladsgroup.json
  • 20:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T370903)', diff saved to https://phabricator.wikimedia.org/P67495 and previous config saved to /var/cache/conftool/dbconfig/20240821-203029-ladsgroup.json
  • 20:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 20:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 20:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T370903)', diff saved to https://phabricator.wikimedia.org/P67494 and previous config saved to /var/cache/conftool/dbconfig/20240821-203007-ladsgroup.json
  • 20:29 cjming@deploy1003: Finished scap sync-world: Backport for ve.ui.CodeMirrorAction.v6: use infinity viewport to avoid misalignment (T357482) (duration: 13m 14s)
  • 20:25 cjming@deploy1003: musikanimal, cjming: Continuing with sync
  • 20:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P67492 and previous config saved to /var/cache/conftool/dbconfig/20240821-202224-ladsgroup.json
  • 20:21 cjming@deploy1003: musikanimal, cjming: Backport for ve.ui.CodeMirrorAction.v6: use infinity viewport to avoid misalignment (T357482) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:16 cjming@deploy1003: Started scap sync-world: Backport for ve.ui.CodeMirrorAction.v6: use infinity viewport to avoid misalignment (T357482)
  • 20:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P67489 and previous config saved to /var/cache/conftool/dbconfig/20240821-201500-ladsgroup.json
  • 20:14 bearloga@deploy1003: Finished deploy [airflow-dags/analytics_product@1856d12]: (no justification provided) (duration: 00m 03s)
  • 20:14 bearloga@deploy1003: Started deploy [airflow-dags/analytics_product@1856d12]: (no justification provided)
  • 20:12 bearloga@deploy1003: Finished deploy [airflow-dags/analytics_product@1856d12]: (no justification provided) (duration: 00m 17s)
  • 20:11 bearloga@deploy1003: Started deploy [airflow-dags/analytics_product@1856d12]: (no justification provided)
  • 20:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P67488 and previous config saved to /var/cache/conftool/dbconfig/20240821-200716-ladsgroup.json
  • 19:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P67487 and previous config saved to /var/cache/conftool/dbconfig/20240821-195952-ladsgroup.json
  • 19:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T371742)', diff saved to https://phabricator.wikimedia.org/P67486 and previous config saved to /var/cache/conftool/dbconfig/20240821-195209-ladsgroup.json
  • 19:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T370903)', diff saved to https://phabricator.wikimedia.org/P67485 and previous config saved to /var/cache/conftool/dbconfig/20240821-194445-ladsgroup.json
  • 19:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T371742)', diff saved to https://phabricator.wikimedia.org/P67484 and previous config saved to /var/cache/conftool/dbconfig/20240821-194036-ladsgroup.json
  • 19:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 19:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 19:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T371742)', diff saved to https://phabricator.wikimedia.org/P67483 and previous config saved to /var/cache/conftool/dbconfig/20240821-194014-ladsgroup.json
  • 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T370903)', diff saved to https://phabricator.wikimedia.org/P67482 and previous config saved to /var/cache/conftool/dbconfig/20240821-193843-ladsgroup.json
  • 19:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 19:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T370903)', diff saved to https://phabricator.wikimedia.org/P67481 and previous config saved to /var/cache/conftool/dbconfig/20240821-193821-ladsgroup.json
  • 19:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1017.eqiad.wmnet with OS bookworm
  • 19:30 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P67480 and previous config saved to /var/cache/conftool/dbconfig/20240821-192507-ladsgroup.json
  • 19:24 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P67479 and previous config saved to /var/cache/conftool/dbconfig/20240821-192314-ladsgroup.json
  • 19:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P67478 and previous config saved to /var/cache/conftool/dbconfig/20240821-190959-ladsgroup.json
  • 19:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1017.eqiad.wmnet with reason: host reimage
  • 19:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P67477 and previous config saved to /var/cache/conftool/dbconfig/20240821-190807-ladsgroup.json
  • 19:06 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1017.eqiad.wmnet with reason: host reimage
  • 18:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T371742)', diff saved to https://phabricator.wikimedia.org/P67476 and previous config saved to /var/cache/conftool/dbconfig/20240821-185452-ladsgroup.json
  • 18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T370903)', diff saved to https://phabricator.wikimedia.org/P67475 and previous config saved to /var/cache/conftool/dbconfig/20240821-185300-ladsgroup.json
  • 18:51 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1017.eqiad.wmnet with OS bookworm
  • 18:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T370903)', diff saved to https://phabricator.wikimedia.org/P67474 and previous config saved to /var/cache/conftool/dbconfig/20240821-184633-ladsgroup.json
  • 18:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 18:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 18:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T370903)', diff saved to https://phabricator.wikimedia.org/P67473 and previous config saved to /var/cache/conftool/dbconfig/20240821-184611-ladsgroup.json
  • 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T371742)', diff saved to https://phabricator.wikimedia.org/P67472 and previous config saved to /var/cache/conftool/dbconfig/20240821-184427-ladsgroup.json
  • 18:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T371742)', diff saved to https://phabricator.wikimedia.org/P67471 and previous config saved to /var/cache/conftool/dbconfig/20240821-184405-ladsgroup.json
  • 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bookworm
  • 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:43 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:41 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1017.eqiad.wmnet with OS bookworm
  • 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1018.eqiad.wmnet with OS bookworm
  • 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:39 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1020.eqiad.wmnet with OS bookworm
  • 18:36 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:36 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bookworm
  • 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P67470 and previous config saved to /var/cache/conftool/dbconfig/20240821-183104-ladsgroup.json
  • 18:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P67469 and previous config saved to /var/cache/conftool/dbconfig/20240821-182858-ladsgroup.json
  • 18:21 swfrench-wmf: imported php-memcached_3.2.0++-1+wmf11u1 into component/php81 - T372507
  • 18:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage
  • 18:17 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1018.eqiad.wmnet with reason: host reimage
  • 18:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P67468 and previous config saved to /var/cache/conftool/dbconfig/20240821-181556-ladsgroup.json
  • 18:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage
  • 18:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P67467 and previous config saved to /var/cache/conftool/dbconfig/20240821-181351-ladsgroup.json
  • 18:13 swfrench-wmf: imported php-redis_6.0.2-1+wmf11u1 into component/php81 - T372507
  • 18:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage
  • 18:10 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage
  • 18:09 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage
  • 18:09 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1018.eqiad.wmnet with reason: host reimage
  • 18:08 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage
  • 18:04 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T370903)', diff saved to https://phabricator.wikimedia.org/P67466 and previous config saved to /var/cache/conftool/dbconfig/20240821-180049-ladsgroup.json
  • 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T371742)', diff saved to https://phabricator.wikimedia.org/P67465 and previous config saved to /var/cache/conftool/dbconfig/20240821-175843-ladsgroup.json
  • 17:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T370903)', diff saved to https://phabricator.wikimedia.org/P67464 and previous config saved to /var/cache/conftool/dbconfig/20240821-175638-ladsgroup.json
  • 17:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 17:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 17:56 swfrench-wmf: imported php-igbinary_3.2.15-1+wmf11u1 into component/php81 - T372507
  • 17:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm
  • 17:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bookworm
  • 17:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1018.eqiad.wmnet with OS bookworm
  • 17:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1017.eqiad.wmnet with OS bookworm
  • 17:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bookworm
  • 17:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 17:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 17:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 17:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T371742)', diff saved to https://phabricator.wikimedia.org/P67463 and previous config saved to /var/cache/conftool/dbconfig/20240821-174750-ladsgroup.json
  • 17:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 17:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 17:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T371742)', diff saved to https://phabricator.wikimedia.org/P67462 and previous config saved to /var/cache/conftool/dbconfig/20240821-174728-ladsgroup.json
  • 17:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 17:43 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T370903)', diff saved to https://phabricator.wikimedia.org/P67461 and previous config saved to /var/cache/conftool/dbconfig/20240821-174351-ladsgroup.json
  • 17:39 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1005.eqiad.wmnet with OS bookworm
  • 17:37 swfrench-wmf: imported xdebug_3.3.2-1+wmf11u1 into component/php81 - T372507
  • 17:36 swfrench-wmf: imported wikidiff2_1.14.1-2+wmf11u1 into component/php81 - T372507
  • 17:35 swfrench-wmf: imported tideways_5.0.4-16+wmf11u1 into component/php81 - T372507
  • 17:35 ladsgroup@deploy1003: Finished scap sync-world: Backport for Change the disabled query page for commons (T369024) (duration: 07m 36s)
  • 17:34 swfrench-wmf: imported php-yaml_2.2.3-2+wmf11u1 into component/php81 - T372507
  • 17:34 swfrench-wmf: imported php-wmerrors_2.0.0-1+wmf11u1 into component/php81 - T372507
  • 17:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P67460 and previous config saved to /var/cache/conftool/dbconfig/20240821-173221-ladsgroup.json
  • 17:31 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 17:30 ladsgroup@deploy1003: ladsgroup: Backport for Change the disabled query page for commons (T369024) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P67459 and previous config saved to /var/cache/conftool/dbconfig/20240821-172844-ladsgroup.json
  • 17:28 ladsgroup@deploy1003: Started scap sync-world: Backport for Change the disabled query page for commons (T369024)
  • 17:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host ganeti2036.codfw.wmnet
  • 17:19 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1017.eqiad.wmnet with OS bookworm
  • 17:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P67458 and previous config saved to /var/cache/conftool/dbconfig/20240821-171714-ladsgroup.json
  • 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P67457 and previous config saved to /var/cache/conftool/dbconfig/20240821-171337-ladsgroup.json
  • 17:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T371742)', diff saved to https://phabricator.wikimedia.org/P67455 and previous config saved to /var/cache/conftool/dbconfig/20240821-170206-ladsgroup.json
  • 16:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T370903)', diff saved to https://phabricator.wikimedia.org/P67454 and previous config saved to /var/cache/conftool/dbconfig/20240821-165829-ladsgroup.json
  • 16:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T370903)', diff saved to https://phabricator.wikimedia.org/P67453 and previous config saved to /var/cache/conftool/dbconfig/20240821-165415-ladsgroup.json
  • 16:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T370903)', diff saved to https://phabricator.wikimedia.org/P67452 and previous config saved to /var/cache/conftool/dbconfig/20240821-165353-ladsgroup.json
  • 16:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T371742)', diff saved to https://phabricator.wikimedia.org/P67451 and previous config saved to /var/cache/conftool/dbconfig/20240821-165027-ladsgroup.json
  • 16:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 16:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 16:46 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host ganeti2036.codfw.wmnet
  • 16:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host ganeti2036.codfw.wmnet
  • 16:45 swfrench-wmf: imported php-pcov_1.0.11-5+wmf11u1 into component/php81 - T372507
  • 16:45 swfrench-wmf: imported php-msgpack_2.2.0-4+wmf11u1 into component/php81 - T372507
  • 16:44 swfrench-wmf: imported php-luasandbox_4.1.2-1+wmf11u1 into component/php81 - T372507
  • 16:42 swfrench-wmf: imported php-imagick_3.7.0-6+wmf11u1 into component/php81 - T372507
  • 16:41 swfrench-wmf: imported php-excimer_1.2.2-1+wmf11u1 into component/php81 - T372507
  • 16:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2035.codfw.wmnet with OS bookworm
  • 16:40 swfrench-wmf: imported php-apcu_5.1.23-1+wmf11u1 into component/php81 - T372507
  • 16:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P67450 and previous config saved to /var/cache/conftool/dbconfig/20240821-163846-ladsgroup.json
  • 16:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 16:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P67449 and previous config saved to /var/cache/conftool/dbconfig/20240821-162339-ladsgroup.json
  • 16:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T370903)', diff saved to https://phabricator.wikimedia.org/P67448 and previous config saved to /var/cache/conftool/dbconfig/20240821-160831-ladsgroup.json
  • 16:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T370903)', diff saved to https://phabricator.wikimedia.org/P67447 and previous config saved to /var/cache/conftool/dbconfig/20240821-160345-ladsgroup.json
  • 16:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 16:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 16:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T370903)', diff saved to https://phabricator.wikimedia.org/P67446 and previous config saved to /var/cache/conftool/dbconfig/20240821-160323-ladsgroup.json
  • 15:57 MichaelG_WMF: T372333, with I431d2a checked out, running mwscript /home/migr/GrowthExperiments/maintenance/fixLinkRecommendationData.php --dry-run --wiki=testwiki --search-index --db-table
  • 15:56 ejegg: fundraising python tools upgraded from 490a7b3f to 3f7b238d
  • 15:48 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2035.codfw.wmnet with OS bookworm
  • 15:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P67445 and previous config saved to /var/cache/conftool/dbconfig/20240821-154815-ladsgroup.json
  • 15:36 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
  • 15:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P67444 and previous config saved to /var/cache/conftool/dbconfig/20240821-153306-ladsgroup.json
  • 15:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T370903)', diff saved to https://phabricator.wikimedia.org/P67443 and previous config saved to /var/cache/conftool/dbconfig/20240821-151759-ladsgroup.json
  • 15:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T370903)', diff saved to https://phabricator.wikimedia.org/P67442 and previous config saved to /var/cache/conftool/dbconfig/20240821-151441-ladsgroup.json
  • 15:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T370903)', diff saved to https://phabricator.wikimedia.org/P67441 and previous config saved to /var/cache/conftool/dbconfig/20240821-151419-ladsgroup.json
  • 14:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P67440 and previous config saved to /var/cache/conftool/dbconfig/20240821-145912-ladsgroup.json
  • 14:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2024.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T367856)', diff saved to https://phabricator.wikimedia.org/P67439 and previous config saved to /var/cache/conftool/dbconfig/20240821-144648-marostegui.json
  • 14:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 7:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 14:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 7:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 14:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T367856)', diff saved to https://phabricator.wikimedia.org/P67438 and previous config saved to /var/cache/conftool/dbconfig/20240821-144625-marostegui.json
  • 14:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P67437 and previous config saved to /var/cache/conftool/dbconfig/20240821-144405-ladsgroup.json
  • 14:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2024.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:41 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host ganeti2036.codfw.wmnet
  • 14:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2024']
  • 14:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2024']
  • 14:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P67434 and previous config saved to /var/cache/conftool/dbconfig/20240821-143118-marostegui.json
  • 14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T370903)', diff saved to https://phabricator.wikimedia.org/P67433 and previous config saved to /var/cache/conftool/dbconfig/20240821-142858-ladsgroup.json
  • 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T370903)', diff saved to https://phabricator.wikimedia.org/P67432 and previous config saved to /var/cache/conftool/dbconfig/20240821-142536-ladsgroup.json
  • 14:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 14:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T370903)', diff saved to https://phabricator.wikimedia.org/P67431 and previous config saved to /var/cache/conftool/dbconfig/20240821-142514-ladsgroup.json
  • 14:22 topranks: enable PyBal on lvs2013 to swing traffic back from lvs2014
  • 14:22 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2035.codfw.wmnet with OS bookworm
  • 14:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P67430 and previous config saved to /var/cache/conftool/dbconfig/20240821-141611-marostegui.json
  • 14:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T367856)', diff saved to https://phabricator.wikimedia.org/P67428 and previous config saved to /var/cache/conftool/dbconfig/20240821-140104-marostegui.json
  • 13:58 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 13:58 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 13:55 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P67427 and previous config saved to /var/cache/conftool/dbconfig/20240821-135458-ladsgroup.json
  • 13:54 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 13:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:52 cdanis@deploy1003: Finished scap sync-world: Backport for [arwikinews]: Upgrade license to CC BY-SA 4.0 (T372730) (duration: 10m 05s)
  • 13:50 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:47 cdanis@deploy1003: anwon, cdanis: Continuing with sync
  • 13:44 cdanis@deploy1003: anwon, cdanis: Backport for [arwikinews]: Upgrade license to CC BY-SA 4.0 (T372730) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:42 cdanis@deploy1003: Started scap sync-world: Backport for [arwikinews]: Upgrade license to CC BY-SA 4.0 (T372730)
  • 13:40 cdanis@deploy1003: Finished scap sync-world: Backport for Enable shellbox-video for enwiki (T356241) (duration: 07m 18s)
  • 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T370903)', diff saved to https://phabricator.wikimedia.org/P67426 and previous config saved to /var/cache/conftool/dbconfig/20240821-133950-ladsgroup.json
  • 13:39 topranks: disable PyBal on lvs2013 to switch traffic to lvs2014
  • 13:35 cdanis@deploy1003: hnowlan, cdanis: Continuing with sync
  • 13:35 cdanis@deploy1003: hnowlan, cdanis: Backport for Enable shellbox-video for enwiki (T356241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1183 (T370903)', diff saved to https://phabricator.wikimedia.org/P67425 and previous config saved to /var/cache/conftool/dbconfig/20240821-133411-ladsgroup.json
  • 13:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 13:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 13:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T370903)', diff saved to https://phabricator.wikimedia.org/P67424 and previous config saved to /var/cache/conftool/dbconfig/20240821-133349-ladsgroup.json
  • 13:33 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: test failover lvs2013 to ls2014
  • 13:33 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: test failover lvs2013 to ls2014
  • 13:32 cdanis@deploy1003: Started scap sync-world: Backport for Enable shellbox-video for enwiki (T356241)
  • 13:31 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr[1-2]-codfw with reason: test failover lvs2013 to ls2014
  • 13:31 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr[1-2]-codfw with reason: test failover lvs2013 to ls2014
  • 13:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2035.codfw.wmnet with OS bookworm
  • 13:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P67423 and previous config saved to /var/cache/conftool/dbconfig/20240821-131842-ladsgroup.json
  • 13:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "mgmt: add role - ayounsi@cumin1002"
  • 13:16 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "mgmt: add role - ayounsi@cumin1002"
  • 13:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P67422 and previous config saved to /var/cache/conftool/dbconfig/20240821-130335-ladsgroup.json
  • 12:59 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1005.eqiad.wmnet with OS bookworm
  • 12:53 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 12:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T370903)', diff saved to https://phabricator.wikimedia.org/P67421 and previous config saved to /var/cache/conftool/dbconfig/20240821-124828-ladsgroup.json
  • 12:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T370903)', diff saved to https://phabricator.wikimedia.org/P67420 and previous config saved to /var/cache/conftool/dbconfig/20240821-124252-ladsgroup.json
  • 12:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 12:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 12:34 XioNoX: add python3-pynetbox_7.4.0_all.deb to reprepro - T371890
  • 12:23 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
  • 12:22 XioNoX: install python3-pynetbox_7.4.0 manually on cumin2002
  • 12:22 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1005.eqiad.wmnet with OS bookworm
  • 12:14 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
  • 12:07 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 11:53 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 11:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:01 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 10:55 stran@deploy1003: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 10:54 stran@deploy1003: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 10:53 stran@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:52 stran@deploy1003: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:50 stran@deploy1003: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 10:48 stran@deploy1003: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 10:48 stran@deploy1003: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 10:47 stran@deploy1003: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 10:11 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 10:04 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 09:51 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 09:44 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 09:41 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 09:34 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 09:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 262725
  • 09:14 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 262725
  • 09:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28173
  • 09:13 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 28173
  • 09:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67418 and previous config saved to /var/cache/conftool/dbconfig/20240821-090421-root.json
  • 08:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67417 and previous config saved to /var/cache/conftool/dbconfig/20240821-084915-root.json
  • 08:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67416 and previous config saved to /var/cache/conftool/dbconfig/20240821-083410-root.json
  • 08:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67415 and previous config saved to /var/cache/conftool/dbconfig/20240821-081904-root.json
  • 08:15 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.19 refs T366964
  • 08:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67414 and previous config saved to /var/cache/conftool/dbconfig/20240821-080359-root.json
  • 07:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67413 and previous config saved to /var/cache/conftool/dbconfig/20240821-074854-root.json
  • 07:42 XioNoX: rollback JIO_DIRECT from cr2-eqsin AVOID-PATHS
  • 07:39 XioNoX: enable cloudsw1-d5-eqiad:xe-0/0/21 (SFP now inserted)
  • 07:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67412 and previous config saved to /var/cache/conftool/dbconfig/20240821-073348-root.json
  • 07:27 brouberol@cumin1002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for wdqs1024.eqiad.wmnet: Renew puppet certificate - brouberol@cumin1002
  • 07:27 brouberol@cumin1002: START - Cookbook sre.puppet.renew-cert for wdqs1024.eqiad.wmnet: Renew puppet certificate - brouberol@cumin1002
  • 07:02 XioNoX: remove bgp session to mw2291 on codfw routers (host renumbered)
  • 06:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: post backup w/o prefetch repooling', diff saved to https://phabricator.wikimedia.org/P67411 and previous config saved to /var/cache/conftool/dbconfig/20240821-065624-arnaudb.json
  • 06:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: post backup w/o prefetch repooling', diff saved to https://phabricator.wikimedia.org/P67410 and previous config saved to /var/cache/conftool/dbconfig/20240821-064119-arnaudb.json
  • 06:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: post backup w/o prefetch repooling', diff saved to https://phabricator.wikimedia.org/P67409 and previous config saved to /var/cache/conftool/dbconfig/20240821-062613-arnaudb.json
  • 06:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: post backup w/o prefetch repooling', diff saved to https://phabricator.wikimedia.org/P67408 and previous config saved to /var/cache/conftool/dbconfig/20240821-061108-arnaudb.json
  • 05:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 15%: post backup w/o prefetch repooling', diff saved to https://phabricator.wikimedia.org/P67407 and previous config saved to /var/cache/conftool/dbconfig/20240821-055602-arnaudb.json
  • 05:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: post backup w/o prefetch repooling', diff saved to https://phabricator.wikimedia.org/P67406 and previous config saved to /var/cache/conftool/dbconfig/20240821-054057-arnaudb.json
  • 01:56 eileen: config revision changed from 3ef2ec32 to b1b3a1e6
  • 01:27 eileen: config revision changed from f569b590 to 3ef2ec32 disable jobs to run index-add
  • 00:51 eileen: civicrm upgraded from 1022abf1 to 3b22c823

2024-08-20

  • 22:44 rzl@cumin1002: dbctl commit (dc=all): 'db1206 depooled', diff saved to https://phabricator.wikimedia.org/P67402 and previous config saved to /var/cache/conftool/dbconfig/20240820-224431-rzl.json
  • 22:02 dancy@deploy1003: Installing scap version "4.99.0" for 210 hosts
  • 21:45 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2024.codfw.wmnet with OS bullseye
  • 21:13 cjming: end of UTC late backport window
  • 21:12 cjming@deploy1003: Finished scap sync-world: Backport for Revert "kaawiktionary: add custom logos" (duration: 08m 18s)
  • 21:07 cjming@deploy1003: trainbranchbot, cjming: Continuing with sync
  • 21:07 cjming@deploy1003: trainbranchbot, cjming: Backport for Revert "kaawiktionary: add custom logos" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:04 swfrench-wmf: imported dh-php_5.4+wmf11u1 into component/php81 - T372507
  • 21:03 cjming@deploy1003: Started scap sync-world: Backport for Revert "kaawiktionary: add custom logos"
  • 21:01 cjming@deploy1003: Sync cancelled.
  • 20:58 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2024.codfw.wmnet with OS bullseye
  • 20:57 cjming@deploy1003: cjming, chlod: Backport for kaawiktionary: add custom logos (T368868) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:55 cjming@deploy1003: Started scap sync-world: Backport for kaawiktionary: add custom logos (T368868)
  • 20:54 cjming@deploy1003: Finished scap sync-world: Backport for Revert "kawikisource: add custom logos" (duration: 08m 53s)
  • 20:53 swfrench-wmf: imported php-defaults_92+wmf11u1 into component/php81 - T372507
  • 20:52 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2024.codfw.wmnet with OS bullseye
  • 20:49 cjming@deploy1003: cjming, trainbranchbot: Continuing with sync
  • 20:49 cjming@deploy1003: cjming, trainbranchbot: Backport for Revert "kawikisource: add custom logos" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:45 cjming@deploy1003: Started scap sync-world: Backport for Revert "kawikisource: add custom logos"
  • 20:41 cjming@deploy1003: Sync cancelled.
  • 20:39 cjming@deploy1003: cjming, chlod: Backport for kawikisource: add custom logos (T368868) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:36 cjming@deploy1003: Started scap sync-world: Backport for kawikisource: add custom logos (T368868)
  • 20:35 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 20:32 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-sd2004.codfw.wmnet with OS bookworm
  • 20:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:31 swfrench-wmf: imported php8.1_8.1.29-1+wmf11u1 into component/php81 - T372507
  • 20:26 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 20:05 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2024.codfw.wmnet with OS bullseye
  • 20:05 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2024.codfw.wmnet']
  • 20:04 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2024.codfw.wmnet']
  • 20:02 Emperor: depool/restart/repool ms-fe2014 T360913
  • 20:02 Emperor: depool/restart/repool ms-fe2012 T360913
  • 20:01 Emperor: depool/restart/repool ms-fe2011 T360913
  • 20:00 Emperor: depool/restart/repool ms-fe2009 T360913
  • 19:58 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2024.codfw.wmnet']
  • 19:35 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2024.codfw.wmnet']
  • 19:16 topranks: restarting netbox service on netbox1003 to update script
  • 18:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1017.eqiad.wmnet with OS bookworm
  • 18:24 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2024.codfw.wmnet with OS bullseye
  • 18:03 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2014.codfw.wmnet with OS bullseye
  • 17:51 swfrench-wmf: mediawiki statsd exporter deployments upgraded to bookworm-based image - T368366
  • 17:46 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2014.codfw.wmnet with reason: host reimage
  • 17:45 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1017.eqiad.wmnet with OS bookworm
  • 17:44 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 17:44 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 17:44 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 17:44 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 17:44 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 17:44 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 17:44 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 17:43 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 17:43 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:43 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2014.codfw.wmnet with reason: host reimage
  • 17:43 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 17:43 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 17:43 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 17:37 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2024.codfw.wmnet with OS bullseye
  • 17:32 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "recover wdqs2024 from failed status T372919 - bking@cumin2002"
  • 17:32 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 17:31 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 17:31 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 17:31 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 17:31 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 17:31 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 17:31 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 17:30 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 17:30 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:30 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 17:30 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 17:30 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 17:20 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "recover wdqs2024 from failed status T372919 - bking@cumin2002"
  • 17:16 topranks: removing config for ssw1-a8-codfw link to lvs2014 T370897
  • 17:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:06 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:06 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:02 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:02 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-sd2004.codfw.wmnet with reason: host reimage
  • 16:56 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host lvs2014.codfw.wmnet with OS bullseye
  • 16:55 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-sd2004.codfw.wmnet with reason: host reimage
  • 16:51 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lvs2013.codfw.wmnet on all recursors
  • 16:51 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache lvs2013.codfw.wmnet on all recursors
  • 16:50 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:50 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for lvs2014 - cmooney@cumin1002"
  • 16:49 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for lvs2014 - cmooney@cumin1002"
  • 16:43 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 16:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2040.codfw.wmnet
  • 16:41 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2040.codfw.wmnet
  • 16:41 claime: Pooling wikikube-worker2040.codfw.wmnet - T351074
  • 16:40 topranks: adding vlans to lsw1-d2-codfw for lvs2014 T370897
  • 16:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host logging-sd2004.codfw.wmnet with OS bookworm
  • 16:38 claime: Running homer 'lsw1-a3-codfw*' commit 'T351074'
  • 16:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logging-sd2004']
  • 16:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-sd2004']
  • 16:28 mutante: LDAP - removed htriedman from wmf group, added htriedman to nda group (T371644)
  • 16:26 topranks: disabling BGP to PyBal on lvs2014 in preparation for move to new switch T370897
  • 16:24 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on lvs2014.codfw.wmnet with reason: move lvs2014 from asw to lsw
  • 16:24 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on lvs2014.codfw.wmnet with reason: move lvs2014 from asw to lsw
  • 16:23 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on lsw1-d2-codfw.mgmt with reason: move lvs2014 from asw to lsw
  • 16:23 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on lsw1-d2-codfw.mgmt with reason: move lvs2014 from asw to lsw
  • 16:22 topranks: begginng work to reimage lvs2014 onto per-rack vlan in codfw rack D2 and move to new switch T370897
  • 16:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-sd2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2040.codfw.wmnet with OS bullseye
  • 16:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-sd2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding logging-sd2004 to codfw - jhancock@cumin2002"
  • 16:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding logging-sd2004 to codfw - jhancock@cumin2002"
  • 16:05 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 16:05 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 15:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2040.codfw.wmnet with reason: host reimage
  • 15:53 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2040.codfw.wmnet with reason: host reimage
  • 15:52 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 15:52 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 15:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 15:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 15:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7fb7528f2580>
  • 15:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2040
  • 15:35 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2040
  • 15:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2040.codfw.wmnet 161.0.192.10.in-addr.arpa 1.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:34 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2040.codfw.wmnet 161.0.192.10.in-addr.arpa 1.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2040 - cgoubert@cumin1002"
  • 15:34 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2040 - cgoubert@cumin1002"
  • 15:28 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:28 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7fb7528f2580>
  • 15:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2040.codfw.wmnet with OS bullseye
  • 15:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1016.eqiad.wmnet with OS bookworm
  • 15:24 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1013.eqiad.wmnet,service=s1
  • 15:24 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1013.eqiad.wmnet,service=s3
  • 15:24 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1013.eqiad.wmnet with OS bookworm
  • 15:23 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 15:22 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 15:21 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 15:21 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 15:17 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2040.codfw.wmnet with OS bullseye
  • 15:17 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.move-vlan (exit_code=99) for host <spicerack.netbox.NetboxServer object at 0x7fcd02d21d60>
  • 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:15 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7fcd02d21d60>
  • 15:15 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2040.codfw.wmnet with OS bullseye
  • 15:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2291 to wikikube-worker2040
  • 15:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2040
  • 15:13 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2040
  • 15:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2291 to wikikube-worker2040 - cgoubert@cumin1002"
  • 15:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:12 brennen@deploy1003: Finished deploy [phabricator/deployment@89f5014]: deploy phab1004 for T372898 (duration: 00m 31s)
  • 15:11 brennen@deploy1003: Started deploy [phabricator/deployment@89f5014]: deploy phab1004 for T372898
  • 15:11 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2291 to wikikube-worker2040 - cgoubert@cumin1002"
  • 15:11 XioNoX: deploy pfw policy update 1724083328 - T372792
  • 15:10 brennen@deploy1003: Finished deploy [phabricator/deployment@89f5014]: deploy phab2002 for T372898 (test redux) (duration: 01m 22s)
  • 15:09 brennen@deploy1003: Started deploy [phabricator/deployment@89f5014]: deploy phab2002 for T372898 (test redux)
  • 15:07 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:07 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2291 to wikikube-worker2040
  • 15:05 brennen@deploy1003: Finished deploy [phabricator/deployment@89f5014]: deploy phab2002 for T372898 (duration: 00m 33s)
  • 15:04 brennen@deploy1003: Started deploy [phabricator/deployment@89f5014]: deploy phab2002 for T372898
  • 15:04 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
  • 15:04 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
  • 15:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2291.codfw.wmnet
  • 15:04 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:04 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:03 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2291.codfw.wmnet
  • 15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • 15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • 15:00 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 14:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2035.codfw.wmnet with OS bookworm
  • 14:45 claime: Depooling mw2291.codfw.wmnet for rename and ip renumbering - T372878
  • 14:43 klausman@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 14:42 klausman@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 14:38 klausman@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 14:37 klausman@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 14:37 klausman@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 14:32 klausman@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 14:31 klausman@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:30 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:29 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:24 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1013.eqiad.wmnet with reason: host reimage
  • 14:22 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1013.eqiad.wmnet with reason: host reimage
  • 14:22 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:22 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:03 mforns@deploy1003: Finished deploy [airflow-dags/analytics@c202679]: (no justification provided) (duration: 00m 51s)
  • 14:02 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:02 mforns@deploy1003: Started deploy [airflow-dags/analytics@c202679]: (no justification provided)
  • 14:02 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 13:59 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet,service=s7
  • 13:59 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet,service=s2
  • 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:59 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1014.eqiad.wmnet with OS bookworm
  • 13:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 13:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 13:54 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 13:53 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 13:51 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 13:50 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:31 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1014.eqiad.wmnet with reason: host reimage
  • 13:28 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1014.eqiad.wmnet with reason: host reimage
  • 13:15 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1014.eqiad.wmnet with OS bookworm
  • 13:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1299.eqiad.wmnet with OS bullseye
  • 13:12 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 13:06 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1014.eqiad.wmnet with reason: Reimaging clouddb1014 T365424
  • 13:06 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb1014.eqiad.wmnet with reason: Reimaging clouddb1014 T365424
  • 13:05 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1014.eqiad.wmnet,service=s2
  • 13:05 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1014.eqiad.wmnet,service=s7
  • 12:59 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 12:45 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:40 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:38 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:37 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:37 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:37 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 12:34 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:33 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:31 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:29 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 12:28 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 12:27 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:26 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 12:26 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 12:25 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:19 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Fix DeletedContributions for user names containing spaces (T372444), Allow ContributionsSpecialPage to accept usemodwiki IP addresses (T370413), Allow ContributionsSpecialPage to accept usemodwiki IP addresses (T370413) (duration: 11m 38s)
  • 12:15 dreamyjazz@deploy1003: dreamyjazz, samtar: Continuing with sync
  • 12:12 dreamyjazz@deploy1003: dreamyjazz, samtar: Backport for Fix DeletedContributions for user names containing spaces (T372444), Allow ContributionsSpecialPage to accept usemodwiki IP addresses (T370413), Allow ContributionsSpecialPage to accept usemodwiki IP addresses (T370413) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:08 dreamyjazz@deploy1003: Started scap sync-world: Backport for Fix DeletedContributions for user names containing spaces (T372444), Allow ContributionsSpecialPage to accept usemodwiki IP addresses (T370413), Allow ContributionsSpecialPage to accept usemodwiki IP addresses (T370413)
  • 09:42 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:42 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:42 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:41 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:41 cgoubert@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:40 cgoubert@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:40 cgoubert@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:39 cgoubert@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:39 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:37 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:36 claime: Deploying calico configuration for codfw row c/d lsw - 1062728
  • 09:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 08:15 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.19 refs T366964
  • 08:15 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:04 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 07:25 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-main journal) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1022.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
  • 07:18 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Update Netbox wheels - ayounsi@cumin1002 - T371890
  • 07:14 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Update Netbox wheels - ayounsi@cumin1002 - T371890
  • 06:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Update Netbox-next wheels - ayounsi@cumin1002 - T371890
  • 06:47 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Update Netbox-next wheels - ayounsi@cumin1002 - T371890
  • 06:43 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 18:00:00 on wdqs[2021-2023,2025].codfw.wmnet with reason: T364368 non-prod hosts
  • 06:43 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 18:00:00 on wdqs[2021-2023,2025].codfw.wmnet with reason: T364368 non-prod hosts
  • 06:43 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 05s)
  • 06:42 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
  • 06:40 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1022.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
  • 06:36 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 13s)
  • 06:36 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
  • 05:22 marostegui: Deploy schema change on s1 eqiad old master db1184 dbmaint T367856
  • 05:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1184 T372524', diff saved to https://phabricator.wikimedia.org/P67395 and previous config saved to /var/cache/conftool/dbconfig/20240820-051948-marostegui.json
  • 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1163 to s1 primary and set section read-write T372524', diff saved to https://phabricator.wikimedia.org/P67394 and previous config saved to /var/cache/conftool/dbconfig/20240820-051843-marostegui.json
  • 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T372524', diff saved to https://phabricator.wikimedia.org/P67393 and previous config saved to /var/cache/conftool/dbconfig/20240820-051821-root.json
  • 05:18 marostegui: Starting s1 eqiad failover from db1184 to db1163 - T372524
  • 05:17 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1163 with weight 0 T372524', diff saved to https://phabricator.wikimedia.org/P67392 and previous config saved to /var/cache/conftool/dbconfig/20240820-051726-marostegui.json
  • 05:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1184.eqiad.wmnet with reason: Long schema change
  • 05:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1184.eqiad.wmnet with reason: Long schema change
  • 04:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T372524
  • 04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1163 with weight 0 T372524', diff saved to https://phabricator.wikimedia.org/P67391 and previous config saved to /var/cache/conftool/dbconfig/20240820-045212-root.json
  • 04:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T372524
  • 04:00 mwpresync@deploy1003: Pruned MediaWiki: 1.43.0-wmf.16 (duration: 00m 56s)
  • 03:49 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.43.0-wmf.19 refs T366964 (duration: 46m 32s)
  • 03:02 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.43.0-wmf.19 refs T366964
  • 00:21 mutante: previous message about prometheus can be ignored - race condition that solved itself on next puppet run
  • 00:04 mutante: prometheus3003/prometheus1006 - are trying to use puppetserver1002 but get connection refused from puppetservre1001.eqiad.wmnet port 8140 - causing other puppet errors

2024-08-19

  • 23:59 mutante: prometheus - puppet on prometheus hosts very slow - reason appears to be that /srv/prometheus is recursively managed by puppet but has ~ 20x more files than the default soft limit of 1000
  • 23:55 mutante: prometheus - switched ferm::service to firewall::service (gerrit:1057952) - NOOP except /etc/ferm/conf.d/10_prometheus-web becomes /etc/ferm/conf.d/10_prometheus_web with identical rules
  • 23:15 ejegg: fundraising civicrm upgraded from fd01c939 to 1022abf1
  • 22:30 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1041.eqiad.wmnet with OS bullseye
  • 22:12 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1041.eqiad.wmnet with reason: host reimage
  • 22:09 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1041.eqiad.wmnet with reason: host reimage
  • 21:50 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1041.eqiad.wmnet with OS bullseye
  • 21:48 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 21:30 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1040.eqiad.wmnet with reason: host reimage
  • 21:26 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1040.eqiad.wmnet with reason: host reimage
  • 21:07 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 21:06 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1039.eqiad.wmnet with OS bullseye
  • 20:57 eevans@deploy1003: Finished deploy [restbase/deploy@b504108] (beta): Dry run beta deployment test (duration: 00m 06s)
  • 20:57 eevans@deploy1003: Started deploy [restbase/deploy@b504108] (beta): Dry run beta deployment test
  • 20:52 sbassett: Deployed changes from T372570 to security.wikimedia.org (miscweb)
  • 20:49 sbassett@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 20:49 sbassett@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 20:49 sbassett@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 20:49 sbassett@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 20:49 sbassett@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 20:48 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1039.eqiad.wmnet with reason: host reimage
  • 20:46 sbassett@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 20:45 eevans@deploy1003: Finished deploy [restbase/deploy@b504108] (beta): Dry run beta deployment test (duration: 00m 32s)
  • 20:45 sbassett@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 20:45 eevans@deploy1003: Started deploy [restbase/deploy@b504108] (beta): Dry run beta deployment test
  • 20:44 mforns@deploy1003: Finished deploy [airflow-dags/analytics_test@3ec5119]: (no justification provided) (duration: 00m 11s)
  • 20:44 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1039.eqiad.wmnet with reason: host reimage
  • 20:44 mforns@deploy1003: Started deploy [airflow-dags/analytics_test@3ec5119]: (no justification provided)
  • 20:42 sbassett@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 20:26 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
  • 20:26 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1039.eqiad.wmnet with OS bullseye
  • 20:00 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
  • 19:59 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1039.eqiad.wmnet with OS bullseye
  • 19:54 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2024.codfw.wmnet with OS bullseye
  • 19:53 dancy@deploy1003: Started scap sync-world: testing T371904
  • 19:52 dancy@deploy1003: Installation of scap version "4.98.0" completed for 207 hosts
  • 19:52 dancy@deploy1003: Installing scap version "4.98.0" for 207 hosts
  • 19:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 19:45 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
  • 19:45 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1039.eqiad.wmnet with OS bullseye
  • 19:29 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
  • 19:29 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1039.eqiad.wmnet with OS bullseye
  • 19:28 sbassett@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 19:28 sbassett@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 19:07 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2024.codfw.wmnet with OS bullseye
  • 19:06 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
  • 19:04 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1039.eqiad.wmnet with OS bullseye
  • 18:37 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
  • 18:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 18:29 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:22 ejegg: fundraising civicrm upgraded from 56521963 to fd01c939
  • 18:19 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 18:13 mforns@deploy1003: Finished deploy [analytics/refinery@9eaecec] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9eaecec7] (duration: 03m 24s)
  • 18:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1299.eqiad.wmnet with reason: host reimage
  • 18:12 Lucas_WMDE: FINISHED lucaswerkmeister-wmde@mwmaint1002:~$ foreachwiki maintenance/cleanupTitles.php --prefix=T195546 --reporting-interval=1000000000 2>&1 | tee ~/T195546.log
  • 18:10 mforns@deploy1003: Started deploy [analytics/refinery@9eaecec] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9eaecec7]
  • 18:09 mforns@deploy1003: Finished deploy [analytics/refinery@9eaecec] (thin): Regular analytics weekly train THIN [analytics/refinery@9eaecec7] (duration: 04m 25s)
  • 18:09 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1299.eqiad.wmnet with reason: host reimage
  • 18:05 mforns@deploy1003: Started deploy [analytics/refinery@9eaecec] (thin): Regular analytics weekly train THIN [analytics/refinery@9eaecec7]
  • 17:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:55 mforns@deploy1003: Finished deploy [analytics/refinery@9eaecec]: Regular analytics weekly train [analytics/refinery@9eaecec7] (duration: 12m 30s)
  • 17:53 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1299.eqiad.wmnet with OS bullseye
  • 17:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1286.eqiad.wmnet with OS bullseye
  • 17:50 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 17:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1285.eqiad.wmnet with OS bullseye
  • 17:50 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 17:45 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 17:44 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2023.codfw.wmnet with OS bullseye
  • 17:44 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2025.codfw.wmnet with OS bullseye
  • 17:44 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2022.codfw.wmnet with OS bullseye
  • 17:42 mforns@deploy1003: Started deploy [analytics/refinery@9eaecec]: Regular analytics weekly train [analytics/refinery@9eaecec7]
  • 17:40 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 17:38 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 17:38 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 17:36 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:36 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 17:33 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1024.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
  • 17:29 swfrench-wmf: statsd-exporter resource bumps (https://gerrit.wikimedia.org/r/1061856) are now everywhere - T371885
  • 17:27 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 17:27 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 17:27 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 17:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1285.eqiad.wmnet with reason: host reimage
  • 17:27 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 17:27 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 17:26 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 17:26 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 17:26 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 17:26 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:26 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 17:26 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:25 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:25 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 17:25 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 17:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1286.eqiad.wmnet with reason: host reimage
  • 17:20 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1285.eqiad.wmnet with reason: host reimage
  • 17:19 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1286.eqiad.wmnet with reason: host reimage
  • 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 17:13 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 17:11 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2024.codfw.wmnet with OS bullseye
  • 17:09 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 17:09 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 17:08 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 17:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-sd2002.codfw.wmnet with OS bookworm
  • 17:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1285.eqiad.wmnet with OS bullseye
  • 17:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1286.eqiad.wmnet with OS bullseye
  • 16:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1304.eqiad.wmnet with OS bullseye
  • 16:57 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 16:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1303.eqiad.wmnet with OS bullseye
  • 16:57 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 16:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1302.eqiad.wmnet with OS bullseye
  • 16:57 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 16:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1301.eqiad.wmnet with OS bullseye
  • 16:57 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 16:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1300.eqiad.wmnet with OS bullseye
  • 16:57 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 16:47 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1036.eqiad.wmnet with OS bullseye
  • 16:42 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1024.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
  • 16:42 ryankemper@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1024.eqiad.wmnet, repooling neither afterwards
  • 16:41 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1024.eqiad.wmnet, repooling neither afterwards
  • 16:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-sd2002.codfw.wmnet with reason: host reimage
  • 16:38 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1024.eqiad.wmnet, repooling neither afterwards
  • 16:37 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-sd2002.codfw.wmnet with reason: host reimage
  • 16:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet [reason: [done] T372160]
  • 16:28 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
  • 16:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for Reduce rate-limit for trusted editors of commons to 1500 every 3m (T370304) (duration: 06m 33s)
  • 16:25 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
  • 16:23 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2025.codfw.wmnet with OS bullseye
  • 16:23 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2024.codfw.wmnet with OS bullseye
  • 16:23 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2023.codfw.wmnet with OS bullseye
  • 16:23 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2022.codfw.wmnet with OS bullseye
  • 16:21 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 16:21 ladsgroup@deploy1003: ladsgroup: Backport for Reduce rate-limit for trusted editors of commons to 1500 every 3m (T370304) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-sd2001.codfw.wmnet with OS bookworm
  • 16:20 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-sd2003.codfw.wmnet with OS bookworm
  • 16:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:19 ladsgroup@deploy1003: Started scap sync-world: Backport for Reduce rate-limit for trusted editors of commons to 1500 every 3m (T370304)
  • 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:07 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1036.eqiad.wmnet with OS bullseye
  • 15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-sd2001.codfw.wmnet with reason: host reimage
  • 15:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-sd2003.codfw.wmnet with reason: host reimage
  • 15:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-sd2001.codfw.wmnet with reason: host reimage
  • 15:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-sd2003.codfw.wmnet with reason: host reimage
  • 15:39 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2035.codfw.wmnet [reason: T372160]
  • 15:36 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp6016*} and A:cp for 9.2.5-1wm2
  • 15:32 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp6016*} and A:cp for 9.2.5-1wm2
  • 15:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host logging-sd2003.codfw.wmnet with OS bookworm
  • 15:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host logging-sd2002.codfw.wmnet with OS bookworm
  • 15:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host logging-sd2001.codfw.wmnet with OS bookworm
  • 15:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logging-sd2003']
  • 15:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-sd2003']
  • 15:27 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logging-sd2002']
  • 15:25 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:25 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-sd2002']
  • 15:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-sd2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:24 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:19 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:16 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:13 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:05 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1301.eqiad.wmnet with reason: host reimage
  • 15:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logging-sd2001']
  • 15:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logging-sd2003']
  • 15:00 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1302.eqiad.wmnet with reason: host reimage
  • 14:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1304.eqiad.wmnet with reason: host reimage
  • 14:55 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-sd2003']
  • 14:55 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-sd2001']
  • 14:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1303.eqiad.wmnet with reason: host reimage
  • 14:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1300.eqiad.wmnet with reason: host reimage
  • 14:50 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1301.eqiad.wmnet with reason: host reimage
  • 14:50 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1302.eqiad.wmnet with reason: host reimage
  • 14:50 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1303.eqiad.wmnet with reason: host reimage
  • 14:49 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1304.eqiad.wmnet with reason: host reimage
  • 14:49 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1300.eqiad.wmnet with reason: host reimage
  • 14:37 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:37 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:33 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:32 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:32 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:32 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1304.eqiad.wmnet with OS bullseye
  • 14:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1302.eqiad.wmnet with OS bullseye
  • 14:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1303.eqiad.wmnet with OS bullseye
  • 14:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1301.eqiad.wmnet with OS bullseye
  • 14:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1300.eqiad.wmnet with OS bullseye
  • 14:30 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:30 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:29 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1303.eqiad.wmnet with OS bullseye
  • 14:29 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1301.eqiad.wmnet with OS bullseye
  • 14:29 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1300.eqiad.wmnet with OS bullseye
  • 14:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1303.eqiad.wmnet with OS bullseye
  • 14:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1301.eqiad.wmnet with OS bullseye
  • 14:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1300.eqiad.wmnet with OS bullseye
  • 14:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on wdqs[1022,1024].eqiad.wmnet with reason: noisy alerts, will look at later in the day
  • 14:27 bking@cumin2002: START - Cookbook sre.hosts.downtime for 5:00:00 on wdqs[1022,1024].eqiad.wmnet with reason: noisy alerts, will look at later in the day
  • 13:34 Lucas_WMDE: UTC afternoon backport+config window done (except for the T195546 maintenance script which is expected to keep running for a few more hours, currently at commonswiki)
  • 13:31 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:31 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:27 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for (de|uk|ja|he|fi)wiki: enable shellbox-video (T356241) (duration: 06m 57s)
  • 13:23 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet,service=s4
  • 13:23 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet,service=s6
  • 13:22 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, hnowlan: Continuing with sync
  • 13:22 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, hnowlan: Backport for (de|uk|ja|he|fi)wiki: enable shellbox-video (T356241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:20 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for (de|uk|ja|he|fi)wiki: enable shellbox-video (T356241)
  • 13:17 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for Define wgVirtualDomainsMapping for virtual-checkuser-global (T371724) (duration: 10m 23s)
  • 13:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T367856)', diff saved to https://phabricator.wikimedia.org/P67386 and previous config saved to /var/cache/conftool/dbconfig/20240819-131702-marostegui.json
  • 13:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 7:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 13:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 7:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 13:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T367856)', diff saved to https://phabricator.wikimedia.org/P67385 and previous config saved to /var/cache/conftool/dbconfig/20240819-131640-marostegui.json
  • 13:16 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1015.eqiad.wmnet with OS bookworm
  • 13:13 logmsgbot: lucaswerkmeister-wmde@deploy1003 dreamyjazz, lucaswerkmeister-wmde: Continuing with sync
  • 13:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:10 logmsgbot: lucaswerkmeister-wmde@deploy1003 dreamyjazz, lucaswerkmeister-wmde: Backport for Define wgVirtualDomainsMapping for virtual-checkuser-global (T371724) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:10 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 13:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "rdb1014 back to active - cgoubert@cumin1002 - T370633"
  • 13:09 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "rdb1014 back to active - cgoubert@cumin1002 - T370633"
  • 13:07 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for Define wgVirtualDomainsMapping for virtual-checkuser-global (T371724)
  • 13:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:02 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint1002:~$ foreachwiki maintenance/cleanupTitles.php --prefix=T195546 --reporting-interval=1000000000 2>&1 | tee ~/T195546.log
  • 13:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P67384 and previous config saved to /var/cache/conftool/dbconfig/20240819-130132-marostegui.json
  • 13:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:57 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 12:49 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1015.eqiad.wmnet with reason: host reimage
  • 12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P67383 and previous config saved to /var/cache/conftool/dbconfig/20240819-124625-marostegui.json
  • 12:45 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1015.eqiad.wmnet with reason: host reimage
  • 12:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:39 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:39 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 12:38 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:38 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:37 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:37 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:37 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:33 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1015.eqiad.wmnet with OS bookworm
  • 12:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T367856)', diff saved to https://phabricator.wikimedia.org/P67382 and previous config saved to /var/cache/conftool/dbconfig/20240819-123119-marostegui.json
  • 12:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:27 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1015.eqiad.wmnet with reason: Reimaging clouddb1015 T365424
  • 12:27 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb1015.eqiad.wmnet with reason: Reimaging clouddb1015 T365424
  • 12:26 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet,service=s6
  • 12:26 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet,service=s4
  • 12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Enable temporary accounts on test2wiki (T371116) (duration: 22m 14s)
  • 12:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:18 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:18 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 12:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:16 dreamyjazz@deploy1003: dreamyjazz: Backport for Enable temporary accounts on test2wiki (T371116) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:15 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:11 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:03 kevinbazira@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 12:03 dreamyjazz@deploy1003: Started scap sync-world: Backport for Enable temporary accounts on test2wiki (T371116)
  • 12:01 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 11:56 Dreamy_Jazz: Started scanning script for ruwiki with timeout of 6h to catchup to monthly request limit
  • 11:49 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 11:30 kevinbazira@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 11:27 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 10:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 10:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 10:29 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 10:14 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:10 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 10:10 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 10:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67378 and previous config saved to /var/cache/conftool/dbconfig/20240819-100847-root.json
  • 09:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67377 and previous config saved to /var/cache/conftool/dbconfig/20240819-095342-root.json
  • 09:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67376 and previous config saved to /var/cache/conftool/dbconfig/20240819-093836-root.json
  • 09:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67375 and previous config saved to /var/cache/conftool/dbconfig/20240819-092331-root.json
  • 09:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:16 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67374 and previous config saved to /var/cache/conftool/dbconfig/20240819-090825-root.json
  • 09:07 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 09:06 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 08:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67373 and previous config saved to /var/cache/conftool/dbconfig/20240819-085320-root.json
  • 08:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67372 and previous config saved to /var/cache/conftool/dbconfig/20240819-083814-root.json
  • 08:35 marostegui: Upgrade db2136 to 10.11.9 T372551
  • 08:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2136.codfw.wmnet with reason: Upgrade to 10.11.9
  • 08:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2136.codfw.wmnet with reason: Upgrade to 10.11.9
  • 08:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136', diff saved to https://phabricator.wikimedia.org/P67371 and previous config saved to /var/cache/conftool/dbconfig/20240819-083439-root.json
  • 08:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 08:32 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 08:32 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 08:31 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 08:18 brouberol@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:18 brouberol@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding AAAA field to snapshot1010 and dumpsdata1003 - brouberol@cumin1002"
  • 08:18 brouberol@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding AAAA field to snapshot1010 and dumpsdata1003 - brouberol@cumin1002"
  • 08:14 brouberol@cumin1002: START - Cookbook sre.dns.netbox
  • 07:25 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 07:25 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 07:25 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 07:24 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 07:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 07:16 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 07:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 07:14 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67370 and previous config saved to /var/cache/conftool/dbconfig/20240819-070034-root.json
  • 06:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67369 and previous config saved to /var/cache/conftool/dbconfig/20240819-064528-root.json
  • 06:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67368 and previous config saved to /var/cache/conftool/dbconfig/20240819-063023-root.json
  • 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67367 and previous config saved to /var/cache/conftool/dbconfig/20240819-061517-root.json
  • 06:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67366 and previous config saved to /var/cache/conftool/dbconfig/20240819-060011-root.json
  • 05:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67365 and previous config saved to /var/cache/conftool/dbconfig/20240819-054506-root.json
  • 05:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67364 and previous config saved to /var/cache/conftool/dbconfig/20240819-053000-root.json
  • 05:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1195.eqiad.wmnet with reason: Upgrade to 10.6.19
  • 05:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1195.eqiad.wmnet with reason: Upgrade to 10.6.19
  • 05:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1195 T372536', diff saved to https://phabricator.wikimedia.org/P67363 and previous config saved to /var/cache/conftool/dbconfig/20240819-052352-root.json

2024-08-18

  • 22:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Setting back s4 to RW', diff saved to https://phabricator.wikimedia.org/P67362 and previous config saved to /var/cache/conftool/dbconfig/20240818-220355-ladsgroup.json
  • 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Set s4 as read-only ', diff saved to https://phabricator.wikimedia.org/P67361 and previous config saved to /var/cache/conftool/dbconfig/20240818-220043-ladsgroup.json
  • 20:54 kamila@cumin2002: dbctl commit (dc=all): 'Setting s4 back to RW', diff saved to https://phabricator.wikimedia.org/P67360 and previous config saved to /var/cache/conftool/dbconfig/20240818-205410-kamila.json
  • 20:50 kamila@cumin2002: dbctl commit (dc=all): 'Set s4 as read-only due to overload', diff saved to https://phabricator.wikimedia.org/P67359 and previous config saved to /var/cache/conftool/dbconfig/20240818-205024-kamila.json

2024-08-17

  • 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67358 and previous config saved to /var/cache/conftool/dbconfig/20240817-113358-root.json
  • 11:18 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67357 and previous config saved to /var/cache/conftool/dbconfig/20240817-111852-root.json
  • 11:03 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67356 and previous config saved to /var/cache/conftool/dbconfig/20240817-110347-root.json
  • 10:48 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67355 and previous config saved to /var/cache/conftool/dbconfig/20240817-104841-root.json
  • 10:33 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67354 and previous config saved to /var/cache/conftool/dbconfig/20240817-103336-root.json
  • 10:18 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67353 and previous config saved to /var/cache/conftool/dbconfig/20240817-101831-root.json
  • 10:03 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67352 and previous config saved to /var/cache/conftool/dbconfig/20240817-100325-root.json
  • 09:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T367856)', diff saved to https://phabricator.wikimedia.org/P67351 and previous config saved to /var/cache/conftool/dbconfig/20240817-095320-marostegui.json
  • 09:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 7:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 7:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 09:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 7:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 09:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 7:00:00 on db2152.codfw.wmnet with reason: Maintenance

2024-08-16

  • 23:27 eevans@deploy1003: deploy aborted: Test deploy (duration: 00m 25s)
  • 23:27 eevans@deploy1003: Started deploy [cassandra/logstash-logback-encoder@42653e6] (beta): Test deploy
  • 20:26 eevans@deploy1003: Finished deploy [cassandra/logstash-logback-encoder@42653e6] (aqs): Test (duration: 00m 32s)
  • 20:26 eevans@deploy1003: Started deploy [cassandra/logstash-logback-encoder@42653e6] (aqs): Test
  • 20:15 eevans@deploy1003: Finished deploy [cassandra/logstash-logback-encoder@42653e6] (beta): Beta deploy (duration: 00m 31s)
  • 20:14 eevans@deploy1003: Started deploy [cassandra/logstash-logback-encoder@42653e6] (beta): Beta deploy
  • 20:12 eevans@deploy1003: Finished deploy [restbase/deploy@f696b76] (beta): deploy to beta (duration: 01m 05s)
  • 20:11 eevans@deploy1003: Started deploy [restbase/deploy@f696b76] (beta): deploy to beta
  • 20:11 eevans@deploy1003: deploy aborted: deploy to beta (duration: 00m 28s)
  • 20:10 eevans@deploy1003: Started deploy [restbase/deploy@f696b76] (beta): deploy to beta
  • 20:04 eevans@deploy1003: deploy aborted: (no justification provided) (duration: 00m 11s)
  • 20:04 eevans@deploy1003: Started deploy [restbase/deploy@f696b76] (beta): (no justification provided)
  • 20:01 eevans@deploy1003: deploy aborted: (no justification provided) (duration: 00m 20s)
  • 20:01 eevans@deploy1003: Started deploy [restbase/deploy@f696b76] (beta): (no justification provided)
  • 19:59 eevans@deploy1003: Finished deploy [restbase/deploy@f696b76] (beta): (no justification provided) (duration: 00m 33s)
  • 19:59 eevans@deploy1003: Started deploy [restbase/deploy@f696b76] (beta): (no justification provided)
  • 17:18 sbassett@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 17:18 sbassett@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 15:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-sd2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-sd2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-sd2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-sd2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-sd2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:30 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding logging-sd2004 to codfw - jhancock@cumin2002"
  • 15:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding logging-sd2004 to codfw - jhancock@cumin2002"
  • 15:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:12 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:11 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:10 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:10 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:08 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2043
  • 15:08 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2043
  • 15:08 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:08 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2043 to codfw - jhancock@cumin2002"
  • 15:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2043 to codfw - jhancock@cumin2002"
  • 15:06 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:05 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:04 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2042
  • 15:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2042
  • 15:04 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2042 to codfw - jhancock@cumin2002"
  • 15:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2042 to codfw - jhancock@cumin2002"
  • 15:00 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2041
  • 14:59 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2041
  • 14:59 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2041 to codfw - jhancock@cumin2002"
  • 14:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2041 to codfw - jhancock@cumin2002"
  • 14:54 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:52 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2040
  • 14:52 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2040
  • 14:52 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
  • 14:51 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2040 to codfw - jhancock@cumin2002"
  • 14:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2040 to codfw - jhancock@cumin2002"
  • 14:49 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s3
  • 14:48 fnegri@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddb1017.eqiad.wmnet with OS bookworm
  • 14:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:46 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2039
  • 14:46 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2039
  • 14:46 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2039 to codfw - jhancock@cumin2002"
  • 14:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2039 to codfw - jhancock@cumin2002"
  • 14:42 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:36 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2038
  • 14:35 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2038
  • 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2038 to codfw - jhancock@cumin2002"
  • 14:35 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 14:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2038 to codfw - jhancock@cumin2002"
  • 14:34 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 14:31 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:27 isaranto@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 14:26 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2037
  • 14:26 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2037
  • 14:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2037 to codfw - jhancock@cumin2002"
  • 14:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2037 to codfw - jhancock@cumin2002"
  • 14:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
  • 14:21 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
  • 14:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:21 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti2037
  • 14:16 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2037
  • 14:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:03 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 14:00 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2036
  • 13:59 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2036
  • 13:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2035
  • 13:59 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2035
  • 13:59 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2036 to codfw - jhancock@cumin2002"
  • 13:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2036 to codfw - jhancock@cumin2002"
  • 13:55 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 13:51 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1017.eqiad.wmnet with OS bookworm
  • 13:48 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
  • 13:45 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
  • 13:43 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1017.eqiad.wmnet with reason: Reimaging clouddb1017 T365424
  • 13:43 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb1017.eqiad.wmnet with reason: Reimaging clouddb1017 T365424
  • 13:41 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s3
  • 13:41 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
  • 13:26 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 12:49 isaranto@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:21 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 10:21 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1007.eqiad.wmnet with OS bullseye
  • 10:19 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1009.eqiad.wmnet with OS bullseye
  • 10:16 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1010.eqiad.wmnet with OS bullseye
  • 10:14 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1008.eqiad.wmnet with OS bullseye
  • 10:10 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1006.eqiad.wmnet with OS bullseye
  • 10:05 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1007.eqiad.wmnet with reason: host reimage
  • 10:02 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1009.eqiad.wmnet with reason: host reimage
  • 09:58 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1010.eqiad.wmnet with reason: host reimage
  • 09:58 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:57 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 09:56 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1008.eqiad.wmnet with reason: host reimage
  • 09:53 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1010.eqiad.wmnet with reason: host reimage
  • 09:53 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1006.eqiad.wmnet with reason: host reimage
  • 09:51 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1009.eqiad.wmnet with reason: host reimage
  • 09:51 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1008.eqiad.wmnet with reason: host reimage
  • 09:51 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1007.eqiad.wmnet with reason: host reimage
  • 09:50 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1006.eqiad.wmnet with reason: host reimage
  • 09:50 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 09:46 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 09:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 09:43 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 09:35 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1010.eqiad.wmnet with OS bullseye
  • 09:35 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1009.eqiad.wmnet with OS bullseye
  • 09:34 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1008.eqiad.wmnet with OS bullseye
  • 09:34 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1007.eqiad.wmnet with OS bullseye
  • 09:33 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1006.eqiad.wmnet with OS bullseye
  • 09:30 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:29 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 09:23 pfischer@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:23 pfischer@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:52 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1010.eqiad.wmnet with OS bullseye
  • 08:50 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1009.eqiad.wmnet with OS bullseye
  • 08:49 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1008.eqiad.wmnet with OS bullseye
  • 08:48 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1007.eqiad.wmnet with OS bullseye
  • 08:47 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1006.eqiad.wmnet with OS bullseye
  • 08:20 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:20 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:05 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1010.eqiad.wmnet with OS bullseye
  • 08:03 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1009.eqiad.wmnet with OS bullseye
  • 08:02 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1008.eqiad.wmnet with OS bullseye
  • 08:01 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1007.eqiad.wmnet with OS bullseye
  • 08:00 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1006.eqiad.wmnet with OS bullseye
  • 07:43 XioNoX: deploy pfw policy update 1723675086 - T372520
  • 07:40 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2007.codfw.wmnet with OS bullseye
  • 07:23 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2007.codfw.wmnet with reason: host reimage
  • 07:20 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2007.codfw.wmnet with reason: host reimage
  • 07:01 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2007.codfw.wmnet with OS bullseye
  • 06:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repool db2136 - running 10.11', diff saved to https://phabricator.wikimedia.org/P67345 and previous config saved to /var/cache/conftool/dbconfig/20240816-065606-marostegui.json
  • 06:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2152.codfw.wmnet with reason: Schema change
  • 06:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2152.codfw.wmnet with reason: Schema change

2024-08-15

  • 23:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 23:10 xSavitar: T372449 mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Philip Federici' 'FilippoFederici' --ignorestatus
  • 22:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 22:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 22:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
  • 22:02 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
  • 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:54 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 21:53 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards
  • 21:53 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards
  • 21:47 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:46 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
  • 21:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
  • 21:37 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:01 ebernhardson: backport window complete
  • 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2035 to codfw - jhancock@cumin2002"
  • 20:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2035 to codfw - jhancock@cumin2002"
  • 20:54 ejegg: fundraising civicrm upgraded from eecbba5d to 56521963
  • 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 20:45 ebernhardson@deploy1003: Finished scap sync-world: Backport for cirrus: Stop general writes to private wikis (T341332) (duration: 08m 25s)
  • 20:41 ebernhardson@deploy1003: ebernhardson: Continuing with sync
  • 20:39 ebernhardson@deploy1003: ebernhardson: Backport for cirrus: Stop general writes to private wikis (T341332) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:37 ebernhardson@deploy1003: Started scap sync-world: Backport for cirrus: Stop general writes to private wikis (T341332)
  • 20:30 ebernhardson@deploy1003: Finished scap sync-world: Backport for Revert "CommentFormatter: Switch from deprecated addJsConfigVars to new setJsConfigVar" (T372499) (duration: 10m 06s)
  • 20:25 ebernhardson@deploy1003: ebernhardson, matmarex: Continuing with sync
  • 20:23 ebernhardson@deploy1003: ebernhardson, matmarex: Backport for Revert "CommentFormatter: Switch from deprecated addJsConfigVars to new setJsConfigVar" (T372499) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:20 ebernhardson@deploy1003: Started scap sync-world: Backport for Revert "CommentFormatter: Switch from deprecated addJsConfigVars to new setJsConfigVar" (T372499)
  • away: running global rename cleanup script per T372006#10055573
  • 18:15 jhuneidi@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.18 refs T366963
  • 18:02 aokoth@cumin1002: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1001.eqiad.wmnet
  • 18:00 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
  • 17:45 dwisehaupt@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:44 dwisehaupt@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update mgmt dns for civi2002 frpig2002 - dwisehaupt@cumin1002"
  • 17:44 dwisehaupt@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update mgmt dns for civi2002 frpig2002 - dwisehaupt@cumin1002"
  • 17:41 dwisehaupt@cumin1002: START - Cookbook sre.dns.netbox
  • 17:22 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site ulsfo [reason: testing done, T369366]
  • 17:22 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site ulsfo [reason: testing done, T369366]
  • 17:13 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2008.codfw.wmnet with OS bullseye
  • 17:07 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site ulsfo [reason: testing live change, T369366]
  • 17:07 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site ulsfo [reason: testing live change, T369366]
  • 16:54 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2008.codfw.wmnet with reason: host reimage
  • 16:53 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 16:52 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 16:52 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 16:51 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2008.codfw.wmnet with reason: host reimage
  • 16:51 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 15:55 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2007.codfw.wmnet with OS bullseye
  • 15:53 SandraEbele_: reran druid_load_geoeditors_monthly, cassandra_load_editors_by_country_monthly, and druid_load_edit_hourly airflow dags with run_id scheduled__2024-06-01T00:00:00+00:00 as part of down stream tasks after rerunning mediawiki_history_denormalize for 2024-06 snapshot.
  • 15:52 sukhe: sudo cumin -b1 -s60 "A:dnsbox" "run-puppet-agent --enable 'merging CR 1053929 T369366'": T369366
  • 15:51 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:48 sukhe@cumin1002: START - Cookbook sre.dns.netbox
  • 15:45 sukhe: running authdns-update again
  • 15:43 sukhe: running authdns-update
  • 15:31 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:31 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:30 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:30 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:27 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2008.codfw.wmnet with OS bullseye
  • 15:21 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: show site None [reason: no reason specified, no task ID specified]
  • 15:21 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: show site None [reason: no reason specified, no task ID specified]
  • 15:21 sukhe: running authdns-update
  • 15:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: moving ahead with admin_state migration]
  • 15:10 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site esams [reason: no reason specified, no task ID specified]
  • 15:09 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site esams [reason: no reason specified, no task ID specified]
  • 15:09 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: show site None [reason: no reason specified, no task ID specified]
  • 15:09 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: show site None [reason: no reason specified, no task ID specified]
  • 15:04 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: show site None [reason: no reason specified, no task ID specified]
  • 15:03 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: show site None [reason: no reason specified, no task ID specified]
  • 15:02 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site esams [reason: testing on dns4004, no task ID specified]
  • 15:01 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site esams [reason: testing on dns4004, no task ID specified]
  • 15:01 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site magru [reason: testing on dns4004, no task ID specified]
  • 15:00 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site magru [reason: testing on dns4004, no task ID specified]
  • 14:57 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:53 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:49 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:48 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:47 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:46 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:43 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 14:41 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:36 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad [reason: testing on dns4004, no task ID specified]
  • 14:36 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad [reason: testing on dns4004, no task ID specified]
  • 14:35 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 14:34 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad [reason: testing on dns4004, no task ID specified]
  • 14:33 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad [reason: testing on dns4004, no task ID specified]
  • 14:25 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2007.codfw.wmnet with OS bullseye
  • 14:21 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:21 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:01 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site magru for service: text-addrs|text-next [reason: testing on dns4004, no task ID specified]
  • 14:00 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site magru for service: text-addrs|text-next [reason: testing on dns4004, no task ID specified]
  • 13:59 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad [reason: testing on dns4004, no task ID specified]
  • 13:59 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad [reason: testing on dns4004, no task ID specified]
  • 13:54 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns4004.wikimedia.org [reason: admin_state migration test]
  • 13:54 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns4004.wikimedia.org,service=recdns [reason: admin_state migration test]
  • 13:52 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:51 sukhe: sudo cumin "A:dnsbox" 'disable-puppet "merging CR 1053929 T369366"'
  • 13:50 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for Save the request before starting the automatic vanish job (T372006) (duration: 34m 44s)
  • 13:50 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:47 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:46 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:45 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:45 logmsgbot: lucaswerkmeister-wmde@deploy1003 seddon, lucaswerkmeister-wmde: Continuing with sync
  • 13:44 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:43 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:42 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:40 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 13:40 logmsgbot: lucaswerkmeister-wmde@deploy1003 seddon, lucaswerkmeister-wmde: Backport for Save the request before starting the automatic vanish job (T372006) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:38 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 13:35 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:35 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:34 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:34 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:34 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:33 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:33 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:32 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:31 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 13:31 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:30 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 13:26 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:25 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 13:23 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:22 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 13:16 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for Save the request before starting the automatic vanish job (T372006)
  • 12:52 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2009.codfw.wmnet with OS bullseye
  • 12:49 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2010.codfw.wmnet with OS bullseye
  • 12:34 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2009.codfw.wmnet with reason: host reimage
  • 12:32 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2010.codfw.wmnet with reason: host reimage
  • 12:29 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2009.codfw.wmnet with reason: host reimage
  • 12:28 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2010.codfw.wmnet with reason: host reimage
  • 12:26 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:26 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:25 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 12:23 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:10 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bullseye
  • 12:09 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2009.codfw.wmnet with OS bullseye
  • 11:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67341 and previous config saved to /var/cache/conftool/dbconfig/20240815-114213-root.json
  • 11:27 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67340 and previous config saved to /var/cache/conftool/dbconfig/20240815-112707-root.json
  • 11:24 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67339 and previous config saved to /var/cache/conftool/dbconfig/20240815-111201-root.json
  • 11:04 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:00 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 10:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67338 and previous config saved to /var/cache/conftool/dbconfig/20240815-105656-root.json
  • 10:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67337 and previous config saved to /var/cache/conftool/dbconfig/20240815-104150-root.json
  • 10:36 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2006.codfw.wmnet with OS bullseye
  • 10:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1125.eqiad.wmnet with reason: Upgrade to 10.6.19
  • 10:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1125.eqiad.wmnet with reason: Upgrade to 10.6.19
  • 10:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc1014.eqiad.wmnet with reason: Upgrade to 10.6.19
  • 10:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on pc1014.eqiad.wmnet with reason: Upgrade to 10.6.19
  • 10:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2014.codfw.wmnet with reason: Upgrade to 10.6.19
  • 10:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on pc2014.codfw.wmnet with reason: Upgrade to 10.6.19
  • 10:27 marostegui: Install 10.6.19 on pc1014 db1125 pc2014 T372536
  • 10:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67336 and previous config saved to /var/cache/conftool/dbconfig/20240815-102645-root.json
  • 10:21 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:19 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 10:18 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2006.codfw.wmnet with reason: host reimage
  • 10:15 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2006.codfw.wmnet with reason: host reimage
  • 10:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67335 and previous config saved to /var/cache/conftool/dbconfig/20240815-101139-root.json
  • 09:55 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2006.codfw.wmnet with OS bullseye
  • 09:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2152.codfw.wmnet with reason: Schema change
  • 09:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2152.codfw.wmnet with reason: Schema change
  • 09:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T367856)', diff saved to https://phabricator.wikimedia.org/P67334 and previous config saved to /var/cache/conftool/dbconfig/20240815-092502-marostegui.json
  • 09:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 09:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 08:55 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2006.codfw.wmnet with OS bullseye
  • 08:04 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2006.codfw.wmnet with OS bullseye
  • 08:00 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2006.codfw.wmnet with OS bullseye
  • 07:47 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2009.codfw.wmnet with OS bullseye
  • 07:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 10:00:00 on 9 hosts with reason: T364368 non-prod hosts
  • 07:31 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 10:00:00 on 9 hosts with reason: T364368 non-prod hosts
  • 07:09 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2006.codfw.wmnet with OS bullseye
  • 06:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67333 and previous config saved to /var/cache/conftool/dbconfig/20240815-063734-root.json
  • 06:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67332 and previous config saved to /var/cache/conftool/dbconfig/20240815-062229-root.json
  • 06:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67331 and previous config saved to /var/cache/conftool/dbconfig/20240815-060723-root.json
  • 05:52 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67330 and previous config saved to /var/cache/conftool/dbconfig/20240815-055218-root.json
  • 05:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67329 and previous config saved to /var/cache/conftool/dbconfig/20240815-053712-root.json
  • 05:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67328 and previous config saved to /var/cache/conftool/dbconfig/20240815-052206-root.json
  • 05:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67327 and previous config saved to /var/cache/conftool/dbconfig/20240815-050701-root.json
  • 05:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1223 T372393', diff saved to https://phabricator.wikimedia.org/P67326 and previous config saved to /var/cache/conftool/dbconfig/20240815-050613-root.json
  • 05:04 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1189 to s3 primary and set section read-write T372393', diff saved to https://phabricator.wikimedia.org/P67325 and previous config saved to /var/cache/conftool/dbconfig/20240815-050428-root.json
  • 05:04 marostegui@cumin1002: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T372393', diff saved to https://phabricator.wikimedia.org/P67324 and previous config saved to /var/cache/conftool/dbconfig/20240815-050410-root.json
  • 05:03 marostegui: Starting s3 eqiad failover from db1223 to db1189 - T372393
  • 04:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Stop MariaDB on db1238 T371342
  • 04:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Stop MariaDB on db1238 T371342
  • 04:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s3 T372393
  • 04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1189 with weight 0 T372393', diff saved to https://phabricator.wikimedia.org/P67323 and previous config saved to /var/cache/conftool/dbconfig/20240815-044929-root.json
  • 04:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s3 T372393
  • 03:26 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 03:26 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix mgmt DNS fro fd2004 - pt1979@cumin2002"
  • 03:26 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix mgmt DNS fro fd2004 - pt1979@cumin2002"
  • 03:22 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 02:24 milimetric@deploy1003: Finished deploy [airflow-dags/analytics@02f37cf]: (no justification provided) (duration: 00m 43s)
  • 02:23 milimetric@deploy1003: Started deploy [airflow-dags/analytics@02f37cf]: (no justification provided)

2024-08-14

  • 23:34 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:33 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:30 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:30 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:09 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:09 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:07 dwisehaupt@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:05 dwisehaupt@cumin1002: START - Cookbook sre.dns.netbox
  • 22:56 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:56 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:52 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:51 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:50 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:50 dwisehaupt@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:50 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:48 dwisehaupt@cumin1002: START - Cookbook sre.dns.netbox
  • 22:48 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:48 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:28 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:28 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:17 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:15 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:07 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:05 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:31 jhuneidi@deploy1003: Finished scap sync-world: Backport for Revert "Activates the "compact" Parsoid indicator on all wikivoyage wikis" (duration: 06m 40s)
  • 20:26 jhuneidi@deploy1003: trainbranchbot, jhuneidi: Continuing with sync
  • 20:26 jhuneidi@deploy1003: trainbranchbot, jhuneidi: Backport for Revert "Activates the "compact" Parsoid indicator on all wikivoyage wikis" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:24 jhuneidi@deploy1003: Started scap sync-world: Backport for Revert "Activates the "compact" Parsoid indicator on all wikivoyage wikis"
  • 20:21 jhuneidi@deploy1003: Sync cancelled.
  • 20:14 jhuneidi@deploy1003: ihurbain, jhuneidi: Backport for Activates the "compact" Parsoid indicator on all wikivoyage wikis synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:11 jhuneidi@deploy1003: Started scap sync-world: Backport for Activates the "compact" Parsoid indicator on all wikivoyage wikis
  • 19:26 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@6d50458]: Test Refine through Airflow (duration: 00m 12s)
  • 19:26 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@6d50458]: Test Refine through Airflow
  • 18:14 jhuneidi@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.18 refs T366963
  • 17:50 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2010.codfw.wmnet with OS bullseye
  • 17:35 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site esams [reason: no reason specified, no task ID specified]
  • 17:35 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site esams [reason: no reason specified, no task ID specified]
  • 17:32 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site esams for service: text-addrs|text-next [reason: no reason specified, no task ID specified]
  • 17:32 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site esams for service: text-addrs|text-next [reason: no reason specified, no task ID specified]
  • 17:31 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: show site None [reason: no reason specified, no task ID specified]
  • 17:31 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: show site None [reason: no reason specified, no task ID specified]
  • 17:31 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site magru for service: text-addrs [reason: no reason specified, no task ID specified]
  • 17:31 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site magru for service: text-addrs [reason: no reason specified, no task ID specified]
  • 17:31 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: show site None [reason: no reason specified, no task ID specified]
  • 17:31 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: show site None [reason: no reason specified, no task ID specified]
  • 17:30 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad [reason: testing cookbook, T369366]
  • 17:30 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad [reason: testing cookbook, T369366]
  • 17:30 sukhe@cumin1002: END (FAIL) - Cookbook sre.dns.admin (exit_code=99) DNS admin: pool site eqiad [reason: no reason specified, no task ID specified]
  • 17:30 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad [reason: no reason specified, no task ID specified]
  • 17:30 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: show site None [reason: no reason specified, no task ID specified]
  • 17:30 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: show site None [reason: no reason specified, no task ID specified]
  • 17:17 otto@deploy1003: Finished deploy [airflow-dags/analytics_product@6d50458]: (no justification provided) (duration: 00m 08s)
  • 17:17 otto@deploy1003: Started deploy [airflow-dags/analytics_product@6d50458]: (no justification provided)
  • 17:16 SandraEbele_: reran geoeditors_public_monthly airflow dag with run_id scheduled__2024-06-01T00:00:00+00:00 as part of down stream tasks after rerunning mediawiki_history_denormalize for 2024-06 snapshot.
  • 17:13 ladsgroup@deploy1003: Finished scap sync-world: Backport for Avoid primary DB query for non-talk page edits (T370304), Avoid primary DB query for non-talk page edits (T370304) (duration: 07m 54s)
  • 17:12 SandraEbele_: reran geoeditors_monthly airflow dag with run_id scheduled__2024-06-01T00:00:00+00:00 as part of down stream tasks after rerunning mediawiki_history_denormalize for 2024-06 snapshot.
  • 17:09 SandraEbele_: reran geoeditors_edits_monthly airflow dag with run_id scheduled__2024-06-01T00:00:00+00:00 as part of down stream tasks after rerunning mediawiki_history_denormalize for 2024-06 snapshot.
  • 17:08 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 17:07 ladsgroup@deploy1003: ladsgroup: Backport for Avoid primary DB query for non-talk page edits (T370304), Avoid primary DB query for non-talk page edits (T370304) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:05 ladsgroup@deploy1003: Started scap sync-world: Backport for Avoid primary DB query for non-talk page edits (T370304), Avoid primary DB query for non-talk page edits (T370304)
  • 16:59 otto@deploy1003: Finished deploy [analytics/refinery@f033576]: Regular analytics weekly train [analytics/refinery@f0335766] (duration: 06m 48s)
  • 16:55 SandraEbele_: reran unique_editors_by_country_monthly airflow dag with run_id scheduled__2024-06-01T00:00:00+00:00 as part of down stream tasks after rerunning mediawiki_history_denormalize for 2024-06 snapshot.
  • 16:52 SandraEbele_: reran edit_hourly airflow dag with run_id scheduled__2024-06-01T00:00:00+00:00 as part of down stream tasks after rerunning mediawiki_history_denormalize for 2024-06 snapshot.
  • 16:52 otto@deploy1003: Started deploy [analytics/refinery@f033576]: Regular analytics weekly train [analytics/refinery@f0335766]
  • 16:52 otto@deploy1003: Finished deploy [analytics/refinery@f033576] (thin): Regular analytics weekly train THIN [analytics/refinery@f0335766] (duration: 04m 13s)
  • 16:48 SandraEbele_: reran editors_daily_monthly airflow dag with run_id scheduled__2024-06-01T00:00:00+00:00 as part of downstream tasks after rerunning mediawiki_history_denormalize dag
  • 16:48 otto@deploy1003: Started deploy [analytics/refinery@f033576] (thin): Regular analytics weekly train THIN [analytics/refinery@f0335766]
  • 16:45 otto@deploy1003: Finished deploy [analytics/refinery@f033576] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f0335766] (duration: 03m 06s)
  • 16:45 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 16:43 ladsgroup@deploy1003: ladsgroup: Backport for Avoid primary DB query for non-talk page edits (T370304) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:42 otto@deploy1003: Started deploy [analytics/refinery@f033576] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f0335766]
  • 16:41 ladsgroup@deploy1003: Started scap sync-world: Backport for Avoid primary DB query for non-talk page edits (T370304)
  • 16:28 arnaudb@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67318 and previous config saved to /var/cache/conftool/dbconfig/20240814-162854-arnaudb.json
  • 16:24 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2010.codfw.wmnet with OS bullseye
  • 16:13 arnaudb@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67317 and previous config saved to /var/cache/conftool/dbconfig/20240814-161350-arnaudb.json
  • 16:04 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 16:04 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 16:03 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 16:01 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2009.codfw.wmnet with OS bullseye
  • 15:58 arnaudb@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67316 and previous config saved to /var/cache/conftool/dbconfig/20240814-155844-arnaudb.json
  • 15:48 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:47 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:43 arnaudb@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67315 and previous config saved to /var/cache/conftool/dbconfig/20240814-154338-arnaudb.json
  • 15:40 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 15:39 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 15:39 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 15:39 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 15:39 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 15:39 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 15:34 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bullseye
  • 15:28 arnaudb@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 16%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67314 and previous config saved to /var/cache/conftool/dbconfig/20240814-152833-arnaudb.json
  • 15:13 arnaudb@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 8%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67312 and previous config saved to /var/cache/conftool/dbconfig/20240814-151328-arnaudb.json
  • 14:59 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2010.codfw.wmnet
  • 14:58 arnaudb@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 4%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67307 and previous config saved to /var/cache/conftool/dbconfig/20240814-145819-arnaudb.json
  • 14:53 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve2010.codfw.wmnet
  • 14:49 jayme@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host kafka-main2010.codfw.wmnet with OS bookworm
  • 14:43 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bookworm
  • 14:43 arnaudb@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 2%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67305 and previous config saved to /var/cache/conftool/dbconfig/20240814-144314-arnaudb.json
  • 14:32 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 14:28 arnaudb@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 1%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67304 and previous config saved to /var/cache/conftool/dbconfig/20240814-142808-arnaudb.json
  • 14:27 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 14:22 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 14:21 arnaudb@cumin1002: dbctl commit (dc=all): 'es1 es1029 depooling for hdd hotswap', diff saved to https://phabricator.wikimedia.org/P67299 and previous config saved to /var/cache/conftool/dbconfig/20240814-142147-arnaudb.json
  • 14:21 ebernhardson@deploy1003: Synchronized private/PrivateSettings.php: Update NetworkSession users list for T341332 (duration: 12m 33s)
  • 14:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 13:55 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 13:55 elukey@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 13:52 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 13:50 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 13:33 kartik@deploy1003: Finished scap sync-world: Backport for Use the updated recommendation API from liftwing (T371465) (duration: 07m 51s)
  • 13:32 jayme@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-main2010.codfw.wmnet']
  • 13:29 kartik@deploy1003: kartik: Continuing with sync
  • 13:28 kartik@deploy1003: kartik: Backport for Use the updated recommendation API from liftwing (T371465) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:25 kartik@deploy1003: Started scap sync-world: Backport for Use the updated recommendation API from liftwing (T371465)
  • 13:25 kartik@deploy1003: Finished scap sync-world: Backport for Use the updated recommendation API from liftwing (T371465) (duration: 08m 37s)
  • 13:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 100%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67296 and previous config saved to /var/cache/conftool/dbconfig/20240814-132256-arnaudb.json
  • 13:22 jayme@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-main2010.codfw.wmnet']
  • 13:20 kartik@deploy1003: kartik: Continuing with sync
  • 13:19 kartik@deploy1003: kartik: Backport for Use the updated recommendation API from liftwing (T371465) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:18 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:18 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:16 kartik@deploy1003: Started scap sync-world: Backport for Use the updated recommendation API from liftwing (T371465)
  • 13:14 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:14 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:11 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:11 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 75%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67295 and previous config saved to /var/cache/conftool/dbconfig/20240814-130750-arnaudb.json
  • 12:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 50%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67293 and previous config saved to /var/cache/conftool/dbconfig/20240814-125245-arnaudb.json
  • 12:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on 9 hosts with reason: replication table exclusion deployment
  • 12:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on 9 hosts with reason: replication table exclusion deployment
  • 12:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 25%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67292 and previous config saved to /var/cache/conftool/dbconfig/20240814-123739-arnaudb.json
  • 12:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 16%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67291 and previous config saved to /var/cache/conftool/dbconfig/20240814-122234-arnaudb.json
  • 12:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 8%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67290 and previous config saved to /var/cache/conftool/dbconfig/20240814-120729-arnaudb.json
  • 11:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 4%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67289 and previous config saved to /var/cache/conftool/dbconfig/20240814-115223-arnaudb.json
  • 11:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 2%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67288 and previous config saved to /var/cache/conftool/dbconfig/20240814-113718-arnaudb.json
  • 11:23 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:23 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 1%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67287 and previous config saved to /var/cache/conftool/dbconfig/20240814-112212-arnaudb.json
  • 11:20 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:19 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:19 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:18 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 09:56 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
  • 09:26 klausman@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 09:26 klausman@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 09:23 klausman@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 09:23 klausman@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 09:17 klausman@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 09:16 klausman@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 09:11 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2189.codfw.wmnet with reason: replication still catching up
  • 09:11 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2189.codfw.wmnet with reason: replication still catching up
  • 08:53 jayme@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host kafka-main2010.codfw.wmnet with OS bullseye
  • 08:46 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bullseye
  • 07:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2189.codfw.wmnet with reason: index corruption
  • 07:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2189.codfw.wmnet with reason: index corruption
  • 00:54 eileen: config revision changed from d6f17100 to f569b590
  • 00:41 eileen: civicrm upgraded from dd54b9ae to eecbba5d
  • 00:11 eileen: civicrm upgraded from 686c7c5f to dd54b9ae
  • 00:04 eileen: config revision changed from e8cc0ed6 to d6f17100

2024-08-13

  • 23:08 ejegg: payments-wiki upgraded from 2d48f432 to 3eb3be67
  • 21:56 inflatador: bking@cumin2002 reboot wdqs101[3-5],1018,1020 from DRAC due to unresponsiveness T372442
  • 21:16 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:16 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:15 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:15 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:09 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:09 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:07 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:07 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:51 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 20:22 brett: Update ncmonitor to 1.2.0 via apt1002
  • 19:57 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 19:44 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:43 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:32 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (2 nodes at a time) for ElasticSearch cluster search_eqiad: security update - bking@cumin2002 - T371874
  • 19:29 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards
  • 19:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards
  • 19:25 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:25 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:25 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:25 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:24 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:24 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:05 jhuneidi@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.18 refs T366963
  • 18:54 jhuneidi@deploy1003: Finished scap sync-world: Backport for Revert "Prevent dark-mode styles from affecting print media" (T372370) (duration: 10m 58s)
  • 18:50 jhuneidi@deploy1003: jdlrobson, jhuneidi: Continuing with sync
  • 18:46 jhuneidi@deploy1003: jdlrobson, jhuneidi: Backport for Revert "Prevent dark-mode styles from affecting print media" (T372370) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:44 jhuneidi@deploy1003: Started scap sync-world: Backport for Revert "Prevent dark-mode styles from affecting print media" (T372370)
  • 18:42 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:41 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:41 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:41 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:40 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:40 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:45 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade — T371874 - eevans@cumin1002
  • 17:40 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (2 nodes at a time) for ElasticSearch cluster search_eqiad: security update - bking@cumin2002 - T371874
  • 17:39 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: security update - bking@cumin2002 - T371874
  • 17:39 jhuneidi@deploy1003: Finished scap sync-world: testing T371904 (duration: 10m 31s)
  • 17:28 jhuneidi@deploy1003: Started scap sync-world: testing T371904
  • 17:27 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade — T371874 - eevans@cumin1002
  • 17:26 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: security update - bking@cumin2002 - T371874
  • 17:26 swfrench-wmf: run-puppet-agent on deploy1003 to pick up scap.cfg change for T371904
  • 17:25 jhuneidi@deploy1003: Installation of scap version "latest" completed for 211 hosts
  • 17:24 jhuneidi@deploy1003: Installing scap version "latest" for 211 hosts
  • 17:24 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade — T371874 - eevans@cumin1002
  • 17:06 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade — T371874 - eevans@cumin1002
  • 16:57 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:56 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:50 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:50 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:23 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 7 hosts with reason: prep for replacement of cloudsw1-d5-eqiad
  • 16:22 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on 7 hosts with reason: prep for replacement of cloudsw1-d5-eqiad
  • 15:39 mutante: gerrit - starting to drop packets from abusive sources (T365259)
  • 15:38 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-main2006.codfw.wmnet with OS bookworm
  • 14:56 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2006.codfw.wmnet with OS bookworm
  • 14:25 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2009.codfw.wmnet with OS bullseye
  • 14:24 btullis@deploy1003: Finished deploy [airflow-dags/wmde@109c99e]: (no justification provided) (duration: 00m 08s)
  • 14:24 btullis@deploy1003: Started deploy [airflow-dags/wmde@109c99e]: (no justification provided)
  • 14:24 btullis@deploy1003: Finished deploy [airflow-dags/analytics_product@109c99e]: (no justification provided) (duration: 00m 09s)
  • 14:23 btullis@deploy1003: Started deploy [airflow-dags/analytics_product@109c99e]: (no justification provided)
  • 14:23 btullis@deploy1003: Finished deploy [airflow-dags/platform_eng@109c99e]: (no justification provided) (duration: 00m 24s)
  • 14:23 btullis@deploy1003: Started deploy [airflow-dags/platform_eng@109c99e]: (no justification provided)
  • 14:22 btullis@deploy1003: Finished deploy [airflow-dags/research@109c99e]: (no justification provided) (duration: 00m 11s)
  • 14:22 btullis@deploy1003: Started deploy [airflow-dags/research@109c99e]: (no justification provided)
  • 14:22 btullis@deploy1003: Finished deploy [airflow-dags/search@109c99e]: (no justification provided) (duration: 00m 19s)
  • 14:21 btullis@deploy1003: Started deploy [airflow-dags/search@109c99e]: (no justification provided)
  • 14:21 btullis@deploy1003: Finished deploy [airflow-dags/analytics_test@109c99e]: (no justification provided) (duration: 00m 09s)
  • 14:21 btullis@deploy1003: Started deploy [airflow-dags/analytics_test@109c99e]: (no justification provided)
  • 14:18 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:18 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:17 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1001.eqiad.wmnet
  • 13:57 Lucas_WMDE: UTC backport+config window done (since ~13:10, really)
  • 13:49 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@109c99e]: Airflow upgrade to v 2.9.3 for analytics instance. T365449. (duration: 00m 40s)
  • 13:48 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@109c99e]: Airflow upgrade to v 2.9.3 for analytics instance. T365449.
  • 13:46 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: update wheels - ayounsi@cumin1002
  • 13:41 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: update wheels - ayounsi@cumin1002
  • 13:40 XioNoX: update homer wheels - T371890
  • 13:36 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2009.codfw.wmnet with OS bullseye
  • 13:35 jayme@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-main2009.codfw.wmnet with OS bullseye
  • 13:26 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2009.codfw.wmnet with OS bullseye
  • 13:25 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2009.codfw.wmnet with OS bullseye
  • 13:05 elukey: `apt-get install python3-conftool python3-conftool-requestctl` on all puppetserver nodes - upgrade to 3.2.2
  • 13:04 Lucas_WMDE: FINISHED lucaswerkmeister-wmde@mwmaint1002:~$ mwscript maintenance/cleanupTitles.php --wiki=hewikisource --prefix=T314733 2>&1 | tee ~/T314733.log
  • 13:01 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint1002:~$ mwscript maintenance/cleanupTitles.php --wiki=hewikisource --prefix=T314733 2>&1 | tee ~/T314733.log
  • 12:47 filippo@deploy1003: Finished scap: new statsd-exporter limits (duration: 03m 52s)
  • 12:43 filippo@deploy1003: Started scap sync-world: new statsd-exporter limits
  • 12:37 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2009.codfw.wmnet with OS bullseye
  • 12:18 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2010.codfw.wmnet with OS bullseye
  • 11:35 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-launcher1002.eqiad.wmnet
  • 11:29 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-launcher1002.eqiad.wmnet
  • 11:28 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bullseye
  • 11:17 XioNoX: deploy pfw policy update 1723510554 - T372367
  • 11:11 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1005.eqiad.wmnet
  • 11:07 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-airflow1005.eqiad.wmnet
  • 11:01 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1006.eqiad.wmnet
  • 10:57 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-airflow1006.eqiad.wmnet
  • 10:53 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1004.eqiad.wmnet
  • 10:49 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-airflow1004.eqiad.wmnet
  • 10:38 dcaro@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:38 dcaro@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added ipv6 entry for cloudcephosd1039 - dcaro@cumin1002"
  • 10:38 dcaro@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added ipv6 entry for cloudcephosd1039 - dcaro@cumin1002"
  • 10:38 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1002.eqiad.wmnet
  • 10:32 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-airflow1002.eqiad.wmnet
  • 10:27 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=s8
  • 10:27 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=s5
  • 10:26 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1016.eqiad.wmnet with OS bookworm
  • 10:15 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s8
  • 10:13 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet,service=s8
  • 10:13 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet,service=s5
  • 10:11 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1007.eqiad.wmnet
  • 10:10 dcaro@cumin1002: START - Cookbook sre.dns.netbox
  • 10:07 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-airflow1007.eqiad.wmnet
  • 10:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 10:05 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-airflow1007.eqiad.wmnet
  • 10:05 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-airflow1007.eqiad.wmnet
  • 10:02 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1016.eqiad.wmnet with reason: host reimage
  • 10:00 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 09:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 09:59 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1016.eqiad.wmnet with reason: host reimage
  • 09:59 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 09:54 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2010.codfw.wmnet with OS bullseye
  • 09:46 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1016.eqiad.wmnet with OS bookworm
  • 09:41 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s8
  • 09:41 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s5
  • 09:40 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1016.eqiad.wmnet with reason: Reimaging clouddb1016 T365424
  • 09:40 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb1016.eqiad.wmnet with reason: Reimaging clouddb1016 T365424
  • 09:23 elukey: manual run of dump_cloud_ip_ranges.service on puppetserver1001 (failed earlier on)
  • 09:03 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bullseye
  • 09:01 kevinbazira@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 08:59 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 08:52 elukey: upgrade conftool python packages on puppetserver1001 to 3.2.2
  • 08:51 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:49 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2002.codfw.wmnet
  • 08:48 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:36 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
  • 08:19 XioNoX: upgrade postgresql on netboxdb hosts
  • 08:18 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 08:00 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 07:47 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 07:47 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 07:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: index corruption
  • 07:43 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: index corruption
  • 07:12 arnaudb@cumin1002: dbctl commit (dc=all): 'es1 master: es1027', diff saved to https://phabricator.wikimedia.org/P67282 and previous config saved to /var/cache/conftool/dbconfig/20240813-071240-arnaudb.json
  • 04:00 mwpresync@deploy1003: Pruned MediaWiki: 1.43.0-wmf.15 (duration: 00m 56s)
  • 03:50 mwpresync@deploy1003: Finished scap: testwikis to 1.43.0-wmf.18 refs T366963 (duration: 48m 26s)
  • 03:02 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.43.0-wmf.18 refs T366963

2024-08-12

  • 23:00 rzl@deploy1003: Finished scap: https://gerrit.wikimedia.org/r/1060515 (duration: 02m 14s)
  • 22:58 rzl@deploy1003: Started scap sync-world: https://gerrit.wikimedia.org/r/1060515
  • 21:22 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Apply openjdk upgrade — T371874 - eevans@cumin1002
  • 21:17 zabe: start wrapping type B password hashes with encrypted pbkdf2 in screen - T112359
  • 20:50 jhathaway: upgrading postgresql on puppetdb1003
  • 20:45 jhathaway: upgrading postgresql on puppetdb2003
  • 20:29 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 20:29 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 20:24 zabe: update prefix of wrongly prefixed user password hashes from ':A:' to ':B:' in small batches -- T112359
  • 20:22 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:19 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 20:19 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 20:18 zabe@deploy1003: Finished scap: Backport for EventStreamConfig for mediawiki.cirrussearch.page_weighted_tags_change.rc0 (T366253) (duration: 07m 55s)
  • 20:14 zabe@deploy1003: pfischer, zabe: Continuing with sync
  • 20:13 zabe@deploy1003: pfischer, zabe: Backport for EventStreamConfig for mediawiki.cirrussearch.page_weighted_tags_change.rc0 (T366253) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:10 zabe@deploy1003: Started scap sync-world: Backport for EventStreamConfig for mediawiki.cirrussearch.page_weighted_tags_change.rc0 (T366253)
  • 20:10 zabe@deploy1003: Finished scap: Backport for Set wgAutoConfirmCount to 10 for azwiki (T372172) (duration: 08m 01s)
  • 20:05 zabe@deploy1003: nmw03, zabe: Continuing with sync
  • 20:04 zabe@deploy1003: nmw03, zabe: Backport for Set wgAutoConfirmCount to 10 for azwiki (T372172) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:02 zabe@deploy1003: Started scap sync-world: Backport for Set wgAutoConfirmCount to 10 for azwiki (T372172)
  • 20:01 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:47 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 19:47 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 19:46 ottomata: rolling restart of eventgate-main in codfw - T371767
  • 19:38 zabe@deploy1003: Finished scap: Backport for Use encrypted PBKDF2 for wrapping B type passwords instead of Argon2 (T112359) (duration: 07m 08s)
  • 19:33 zabe@deploy1003: zabe: Continuing with sync
  • 19:33 zabe@deploy1003: zabe: Backport for Use encrypted PBKDF2 for wrapping B type passwords instead of Argon2 (T112359) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P67279 and previous config saved to /var/cache/conftool/dbconfig/20240812-193157-ladsgroup.json
  • 19:30 zabe@deploy1003: Started scap sync-world: Backport for Use encrypted PBKDF2 for wrapping B type passwords instead of Argon2 (T112359)
  • 19:21 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=dewiki --force --db-table --verbose # T372333, script finished, logs are (gzipped) at F57269843
  • 19:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P67278 and previous config saved to /var/cache/conftool/dbconfig/20240812-191650-ladsgroup.json
  • 19:15 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=dewiki --force --db-table --verbose # T372333, script started
  • 19:13 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=dewiki --search-index --verbose # T372333, logs available as P67277
  • 19:09 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Apply openjdk upgrade — T371874 - eevans@cumin1002
  • 19:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P67276 and previous config saved to /var/cache/conftool/dbconfig/20240812-190145-ladsgroup.json
  • 18:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db1238 (T371342)', diff saved to https://phabricator.wikimedia.org/P67275 and previous config saved to /var/cache/conftool/dbconfig/20240812-184830-ladsgroup.json
  • 18:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P67274 and previous config saved to /var/cache/conftool/dbconfig/20240812-184639-ladsgroup.json
  • 18:06 urbanecm@deploy1003: Finished scap: Backport for [Growth] dewiki: Enable frontend for Add Link (T371597) (duration: 09m 59s)
  • 18:02 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:02 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:02 urbanecm@deploy1003: urbanecm: Continuing with sync
  • 17:58 urbanecm@deploy1003: urbanecm: Backport for [Growth] dewiki: Enable frontend for Add Link (T371597) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:57 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:57 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:56 urbanecm@deploy1003: Started scap sync-world: Backport for [Growth] dewiki: Enable frontend for Add Link (T371597)
  • 17:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testhost2001.codfw.wmnet with OS bookworm
  • 17:27 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:27 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testhost2001.codfw.wmnet with reason: host reimage
  • 17:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testhost2001.codfw.wmnet with reason: host reimage
  • 16:59 urbanecm@deploy1003: Finished scap: Backport for Revert "[Growth] dewiki: Enable frontend for Add Link" (duration: 06m 39s)
  • 16:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bookworm
  • 16:55 urbanecm@deploy1003: urbanecm, trainbranchbot: Continuing with sync
  • 16:55 urbanecm@deploy1003: urbanecm, trainbranchbot: Backport for Revert "[Growth] dewiki: Enable frontend for Add Link" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:53 urbanecm@deploy1003: Started scap sync-world: Backport for Revert "[Growth] dewiki: Enable frontend for Add Link"
  • 16:51 urbanecm@deploy1003: Sync cancelled.
  • 16:50 urbanecm@deploy1003: urbanecm: Backport for [Growth] dewiki: Enable frontend for Add Link (T371597) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:48 urbanecm@deploy1003: Started scap sync-world: Backport for [Growth] dewiki: Enable frontend for Add Link (T371597)
  • 16:36 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:36 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:33 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: security update - bking@cumin2002 - T371874
  • 16:32 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:32 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 16:13 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 16:10 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:09 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:00 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:00 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:56 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 15:56 milimetric@deploy1003: Finished deploy [airflow-dags/analytics@416511b]: (no justification provided) (duration: 00m 40s)
  • 15:55 milimetric@deploy1003: Started deploy [airflow-dags/analytics@416511b]: (no justification provided)
  • 15:54 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 15:54 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 15:54 cdanis@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 15:54 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 15:53 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 15:53 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 15:53 cdanis@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 15:53 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 15:52 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 15:52 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 15:52 cdanis@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 15:52 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 15:51 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 15:51 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 15:50 cdanis@deploy1003: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 15:36 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 15:35 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 15:34 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 15:34 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 15:23 urbanecm@deploy1003: Finished scap: Backport for noc: Fix list of databases in db.php (T372249) (duration: 08m 22s)
  • 15:15 urbanecm@deploy1003: Started scap sync-world: Backport for noc: Fix list of databases in db.php (T372249)
  • 15:07 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 15:06 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:44 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: security update - bking@cumin2002 - T371874
  • 14:42 elukey: powercycle ms-be1078 - causing frontend errors in swift-eqiad, network link is down (if down/up didn't work, nothing in the dmesg/syslog)
  • 14:42 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:41 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:34 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:23 zabe@deploy1003: Finished scap: Backport for Further configuration for bdrwiki (T371760) (duration: 21m 07s)
  • 14:01 zabe@deploy1003: Started scap sync-world: Backport for Further configuration for bdrwiki (T371760)
  • 13:46 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 13:46 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 13:33 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 13:33 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 13:25 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 13:24 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:24 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:37 elukey: restart exim4 on list2001 to pick up the new TLS material
  • 12:35 elukey: restart exim4 on list1004 to pick up the new TLS material
  • 12:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 12:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 12:11 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Openjdk upgrade - elukey@cumin1002
  • 12:04 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 12:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 11:59 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 11:26 hnowlan: rebuilding php7.4-fpm and php7.4-fpm-multiversion-base to pick up healthz worker awareness change (r/1060867)
  • 11:22 ladsgroup@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
  • 11:10 kevinbazira@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 11:06 isaranto@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 11:04 isaranto@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 11:03 isaranto@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:19 vgutierrez: restarting apache on puppetmaster1003
  • 09:54 kamila_: rebooting puppetmaster1001 due to intermittent network failures
  • 09:46 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 54994
  • 09:43 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 54994
  • 09:17 urbanecm@deploy1003: Finished scap: Backport for MenteeOverviewApi: Do not apply undefined/null params (T372164) (duration: 19m 54s)
  • 09:11 urbanecm@deploy1003: urbanecm: Continuing with sync
  • 09:11 godog: bounce grafana after https://gerrit.wikimedia.org/r/c/operations/puppet/+/1061955
  • 09:10 urbanecm@deploy1003: urbanecm: Backport for MenteeOverviewApi: Do not apply undefined/null params (T372164) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:57 urbanecm@deploy1003: Started scap sync-world: Backport for MenteeOverviewApi: Do not apply undefined/null params (T372164)
  • 07:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: index corruption
  • 07:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: index corruption
  • 07:38 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 - s2', diff saved to https://phabricator.wikimedia.org/P67270 and previous config saved to /var/cache/conftool/dbconfig/20240812-073846-arnaudb.json

2024-08-11

  • 07:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 07:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 07:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T367856)', diff saved to https://phabricator.wikimedia.org/P67269 and previous config saved to /var/cache/conftool/dbconfig/20240811-075839-marostegui.json
  • 07:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P67268 and previous config saved to /var/cache/conftool/dbconfig/20240811-074332-marostegui.json
  • 07:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P67267 and previous config saved to /var/cache/conftool/dbconfig/20240811-072825-marostegui.json
  • 07:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T367856)', diff saved to https://phabricator.wikimedia.org/P67266 and previous config saved to /var/cache/conftool/dbconfig/20240811-071318-marostegui.json
  • 03:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20240729/ using stat1009.eqiad.wmnet)

2024-08-10

  • 08:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T367856)', diff saved to https://phabricator.wikimedia.org/P67264 and previous config saved to /var/cache/conftool/dbconfig/20240810-085527-marostegui.json
  • 08:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 08:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 08:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T367856)', diff saved to https://phabricator.wikimedia.org/P67263 and previous config saved to /var/cache/conftool/dbconfig/20240810-085505-marostegui.json
  • 08:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P67262 and previous config saved to /var/cache/conftool/dbconfig/20240810-083958-marostegui.json
  • 08:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P67261 and previous config saved to /var/cache/conftool/dbconfig/20240810-082450-marostegui.json
  • 08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T367856)', diff saved to https://phabricator.wikimedia.org/P67260 and previous config saved to /var/cache/conftool/dbconfig/20240810-080943-marostegui.json

2024-08-09

  • 22:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1299.eqiad.wmnet with OS bullseye
  • 21:30 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1266.eqiad.wmnet with OS bullseye
  • 21:30 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 21:29 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 21:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1266.eqiad.wmnet with reason: host reimage
  • 21:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1266.eqiad.wmnet with reason: host reimage
  • 21:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1266.eqiad.wmnet with OS bullseye
  • 21:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1299.eqiad.wmnet with OS bullseye
  • 21:09 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1299.eqiad.wmnet with OS bullseye
  • 21:09 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1299.eqiad.wmnet with OS bullseye
  • 20:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1297.eqiad.wmnet with OS bullseye
  • 20:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1297.eqiad.wmnet with reason: host reimage
  • 20:06 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1297.eqiad.wmnet with reason: host reimage
  • 20:03 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:54 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1300.eqiad.wmnet with OS bullseye
  • 19:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1300.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1303.eqiad.wmnet with OS bullseye
  • 19:51 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1301.eqiad.wmnet with OS bullseye
  • 19:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1303.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:50 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1302.eqiad.wmnet with OS bullseye
  • 19:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1299.eqiad.wmnet with OS bullseye
  • 19:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1304.eqiad.wmnet with OS bullseye
  • 19:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1297.eqiad.wmnet with OS bullseye
  • 19:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1301.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1302.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1304.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1299.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1297.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:30 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:29 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1303.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1302.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1304.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1299.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1301.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1300.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1297.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker - jclark@cumin1002"
  • 19:21 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker - jclark@cumin1002"
  • 19:18 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 19:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1260.eqiad.wmnet with OS bullseye
  • 19:02 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:11 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:10 inflatador: bking@wdqs-codfw-public mitigate codfw wdqs abuse via nginx hotfix T372074
  • 17:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1260.eqiad.wmnet with reason: host reimage
  • 17:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1260.eqiad.wmnet with reason: host reimage
  • 17:28 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1260.eqiad.wmnet with OS bullseye
  • 17:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1016.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1019.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1018.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:15 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1020.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:14 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1260
  • 17:13 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1260
  • 17:12 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 17:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1017.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:00 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1266.eqiad.wmnet with OS bullseye
  • 16:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-presto1018.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:50 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-presto1018.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:50 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-presto1020.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:50 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-presto1019.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:49 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-presto1018.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:48 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-presto1017.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:48 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-presto1016.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1266.eqiad.wmnet with reason: host reimage
  • 16:46 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:46 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt an-presto1016-20 - jclark@cumin1002"
  • 16:46 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt an-presto1016-20 - jclark@cumin1002"
  • 16:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1266.eqiad.wmnet with reason: host reimage
  • 16:43 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 16:28 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1266.eqiad.wmnet with OS bullseye
  • 16:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1260.eqiad.wmnet with OS bullseye
  • 15:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus1007.eqiad.wmnet with OS bookworm
  • 15:08 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus1008.eqiad.wmnet with OS bookworm
  • 15:01 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:57 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host alert1002.wikimedia.org with OS bookworm
  • 14:52 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:50 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus1008.eqiad.wmnet with reason: host reimage
  • 14:37 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus1008.eqiad.wmnet with reason: host reimage
  • 14:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus1007.eqiad.wmnet with reason: host reimage
  • 14:26 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus1007.eqiad.wmnet with reason: host reimage
  • 13:58 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host alert1002.wikimedia.org with OS bookworm
  • 13:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host prometheus1008.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host prometheus1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:55 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:55 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt prometheus1007-8 - jclark@cumin1002"
  • 13:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt prometheus1007-8 - jclark@cumin1002"
  • 13:52 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 13:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host alert1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:29 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host alert1002
  • 13:29 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host alert1002
  • 13:23 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt allert1004 - jclark@cumin1002"
  • 13:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt allert1004 - jclark@cumin1002"
  • 13:20 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 13:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host alert1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T367856)', diff saved to https://phabricator.wikimedia.org/P67259 and previous config saved to /var/cache/conftool/dbconfig/20240809-080904-marostegui.json
  • 08:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T367856)', diff saved to https://phabricator.wikimedia.org/P67258 and previous config saved to /var/cache/conftool/dbconfig/20240809-080842-marostegui.json
  • 07:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P67257 and previous config saved to /var/cache/conftool/dbconfig/20240809-075335-marostegui.json
  • 07:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P67256 and previous config saved to /var/cache/conftool/dbconfig/20240809-073828-marostegui.json
  • 07:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T367856)', diff saved to https://phabricator.wikimedia.org/P67254 and previous config saved to /var/cache/conftool/dbconfig/20240809-072320-marostegui.json
  • 05:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading wikidata_main on wdqs1021.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240729/ using stat1009.eqiad.wmnet)
  • 04:40 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15:00:00 on 9 hosts with reason: T364368 non-prod hosts
  • 04:40 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 15:00:00 on 9 hosts with reason: T364368 non-prod hosts
  • 04:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T364077, transfer to unpooled host (1022) to test cookbook changes) xfer wikidata from wdqs1012.eqiad.wmnet -> wdqs1022.eqiad.wmnet, repooling source-only afterwards
  • 04:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, transfer to unpooled host (1022) to test cookbook changes) xfer wikidata from wdqs1012.eqiad.wmnet -> wdqs1022.eqiad.wmnet, repooling source-only afterwards
  • 04:03 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 03s)
  • 04:03 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
  • 04:03 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 30s)
  • 04:02 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
  • 01:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testhost2001.codfw.wmnet with OS bookworm

2024-08-08

  • 22:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bookworm
  • 22:18 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testhost2001.codfw.wmnet with OS bookworm
  • 21:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bookworm
  • 21:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['testhost2001.codfw.wmnet']
  • 21:55 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['testhost2001.codfw.wmnet']
  • 21:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testhost2001.codfw.wmnet with OS bookworm
  • 21:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testhost2001.codfw.wmnet with reason: host reimage
  • 21:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testhost2001.codfw.wmnet with reason: host reimage
  • 21:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bookworm
  • 21:29 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: security update - bking@cumin2002 - T371874
  • 21:21 ebernhardson@deploy1003: Synchronized private/PrivateSettings.php: Update NetworkSession users list for T341332 (duration: 06m 15s)
  • 21:05 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: security update - bking@cumin2002 - T371874
  • 20:21 samtar@deploy1003: Finished scap: Backport for Enable protection indicators for azwiki (T371440) (duration: 08m 22s)
  • 20:17 samtar@deploy1003: samtar, nmw03: Continuing with sync
  • 20:15 samtar@deploy1003: samtar, nmw03: Backport for Enable protection indicators for azwiki (T371440) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:13 samtar@deploy1003: Started scap sync-world: Backport for Enable protection indicators for azwiki (T371440)
  • 20:12 milimetric@deploy1003: Finished deploy [airflow-dags/analytics@0266527]: (no justification provided) (duration: 00m 49s)
  • 20:11 milimetric@deploy1003: Started deploy [airflow-dags/analytics@0266527]: (no justification provided)
  • 19:57 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 19:56 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 19:56 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 19:55 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 19:55 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 19:54 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 19:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm
  • 19:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testhost2001.codfw.wmnet with OS bookworm
  • 19:31 dancy@deploy1003: Started scap sync-world: testing T371904
  • 19:23 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T371874
  • 19:20 dancy@deploy1003: Started scap sync-world: testing T371904
  • 19:13 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T371874
  • 19:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bookworm
  • 19:06 dancy@deploy1003: Finished scap: testing T371904 (duration: 02m 40s)
  • 19:05 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS bullseye
  • 19:05 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1024.eqiad.wmnet with OS bullseye
  • 19:03 dancy@deploy1003: Started scap sync-world: testing T371904
  • 19:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2011.codfw.wmnet with OS bookworm
  • 19:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 19:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 19:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2010.codfw.wmnet with OS bookworm
  • 19:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 19:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:58 ryankemper: [Elastic] `ryankemper@cumin2002:~$ sudo -E cumin 'elastic2062*,elastic2082*,elastic2088*,elastic2090*,elastic2099*,elastic2103*' 'pool'` (hosts that had not been repooled after previous maintenance)
  • 18:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2011.codfw.wmnet with reason: host reimage
  • 18:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 18:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2010.codfw.wmnet with reason: host reimage
  • 18:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2011.codfw.wmnet with reason: host reimage
  • 18:44 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2010.codfw.wmnet with reason: host reimage
  • 18:41 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 18:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2011.codfw.wmnet with OS bookworm
  • 18:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2010.codfw.wmnet with OS bookworm
  • 17:45 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS bullseye
  • 17:44 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1024.eqiad.wmnet with OS bullseye
  • 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2009.codfw.wmnet with OS bookworm
  • 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:22 dreamyjazz@deploy1003: Finished scap: Backport for Convert gb_id to integer in GlobalBlock (T372063) (duration: 06m 48s)
  • 17:17 dreamyjazz@deploy1003: urbanecm, dreamyjazz: Continuing with sync
  • 17:17 dreamyjazz@deploy1003: urbanecm, dreamyjazz: Backport for Convert gb_id to integer in GlobalBlock (T372063) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for Convert gb_id to integer in GlobalBlock (T372063)
  • 17:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2009.codfw.wmnet with reason: host reimage
  • 17:11 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2009.codfw.wmnet with reason: host reimage
  • 17:11 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20240729/ using stat1009.eqiad.wmnet)
  • 17:09 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2009.codfw.wmnet with OS bookworm
  • 17:08 inflatador: bking@wdqs1020 restart wdqs-blazegraph service due to excessive GC
  • 16:29 elukey: debmonitor-client 0.4.0 rolledout to all bullseye nodes
  • 16:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm
  • 16:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 16:24 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating mgmt ips in codfw - jhancock@cumin2002"
  • 16:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating mgmt ips in codfw - jhancock@cumin2002"
  • 16:20 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testhost2001.codfw.wmnet with OS bookworm
  • 16:14 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bookworm
  • 16:11 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testhost2001.codfw.wmnet with OS bookworm
  • 16:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bookworm
  • 16:07 elukey: on cumin1002 "sudo cumin -b 20 -p 95 'P{F:lsbdistcodename="bullseye"} and A:codfw' 'run-puppet-agent -q --failed-only'"
  • 16:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve2010.codfw.wmnet with OS bookworm
  • 16:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve2009.codfw.wmnet with OS bookworm
  • 16:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2010.codfw.wmnet with OS bookworm
  • 16:04 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2009.codfw.wmnet with OS bookworm
  • 16:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm
  • 16:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 15:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['testhost2001.codfw.wmnet']
  • 15:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve2011.codfw.wmnet with OS bookworm
  • 15:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve2010.codfw.wmnet with OS bookworm
  • 15:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve2009.codfw.wmnet with OS bookworm
  • 15:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['testhost2001.codfw.wmnet']
  • 15:21 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ml-serve2004.codfw.wmnet with reason: Hardware maintenance for memory errors
  • 15:21 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ml-serve2004.codfw.wmnet with reason: Hardware maintenance for memory errors
  • 15:16 Reedy: test
  • 14:52 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Running sync-netbox-hiera manually because it failed during the reimage - fnegri@cumin1002 - T365424"
  • 14:51 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Running sync-netbox-hiera manually because it failed during the reimage - fnegri@cumin1002 - T365424"
  • 14:40 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s2
  • 14:40 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s7
  • 14:32 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2011.codfw.wmnet with OS bookworm
  • 14:32 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2010.codfw.wmnet with OS bookworm
  • 14:32 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2009.codfw.wmnet with OS bookworm
  • 14:25 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1018.eqiad.wmnet with OS bookworm
  • 14:24 fnegri@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002"
  • 14:24 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002"
  • 14:02 ladsgroup@deploy1003: ladsgroup: Backport for Add missing close tags to #contentSub message (T372054) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:01 stevemunene@deploy1003: Finished deploy [airflow-dags/analytics_test@2a3060e]: (no justification provided) (duration: 00m 33s)
  • 14:00 stevemunene@deploy1003: Started deploy [airflow-dags/analytics_test@2a3060e]: (no justification provided)
  • 13:59 ladsgroup@deploy1003: Started scap sync-world: Backport for Add missing close tags to #contentSub message (T372054)
  • 13:51 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 13:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 13:47 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 13:44 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1018.eqiad.wmnet with reason: host reimage
  • 13:41 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1018.eqiad.wmnet with reason: host reimage
  • 13:28 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1018.eqiad.wmnet with OS bookworm
  • 13:25 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1018.eqiad.wmnet with reason: Reimaging clouddb1018 T365424
  • 13:25 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb1018.eqiad.wmnet with reason: Reimaging clouddb1018 T365424
  • 13:24 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s7
  • 13:24 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s2
  • 12:47 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.17 refs T366962
  • 12:23 samtar@deploy1003: Finished scap: Backport for mswikisource: add custom logos (T372031) (duration: 08m 47s)
  • 12:22 dcausse: T371401: reindexing wikidatawiki@codfw to index mul labels
  • 12:18 samtar@deploy1003: chlod, samtar: Continuing with sync
  • 12:18 samtar@deploy1003: chlod, samtar: Backport for mswikisource: add custom logos (T372031) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:14 samtar@deploy1003: Started scap sync-world: Backport for mswikisource: add custom logos (T372031)
  • 12:11 samtar@deploy1003: Finished scap: Backport for bdrwiki: add custom logos (T372031) (duration: 09m 20s)
  • 12:06 samtar@deploy1003: chlod, samtar: Continuing with sync
  • 12:05 samtar@deploy1003: chlod, samtar: Backport for bdrwiki: add custom logos (T372031) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:01 samtar@deploy1003: Started scap sync-world: Backport for bdrwiki: add custom logos (T372031)
  • 11:58 samtar@deploy1003: Finished scap: Backport for dtpwiki: add custom logos (T372031) (duration: 10m 10s)
  • 11:53 samtar@deploy1003: chlod, samtar: Continuing with sync
  • 11:52 samtar@deploy1003: chlod, samtar: Backport for dtpwiki: add custom logos (T372031) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:48 samtar@deploy1003: Started scap sync-world: Backport for dtpwiki: add custom logos (T372031)
  • 11:35 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 10:39 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.17 refs T366962
  • 09:53 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.17 refs T366962
  • 09:38 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Openjdk upgrade - elukey@cumin1002
  • 09:37 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1009.eqiad.wmnet with reason: Rebooting due to CPU soft lockup
  • 09:37 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1009.eqiad.wmnet with reason: Rebooting due to CPU soft lockup
  • 09:32 dreamyjazz@deploy1003: Finished scap: Backport for Fix DefaultPresenter rejecting IPCountInfo instances (T371966) (duration: 10m 38s)
  • 09:27 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 09:24 elukey: powercycle ml-serve2004 - host frozen, no ssh access, get sel shows "Multi-bit memory errors detected on a memory device at location(s) DIMM_A2."
  • 09:23 dreamyjazz@deploy1003: dreamyjazz: Backport for Fix DefaultPresenter rejecting IPCountInfo instances (T371966) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:21 dreamyjazz@deploy1003: Started scap sync-world: Backport for Fix DefaultPresenter rejecting IPCountInfo instances (T371966)
  • 08:45 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:30 dcausse: T371401: reindexing wikidatawiki@eqiad to index mul labels
  • 08:23 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "vtrs1003+gerrit1004 - ayounsi@cumin1002"
  • 08:23 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "vtrs1003+gerrit1004 - ayounsi@cumin1002"
  • 08:19 elukey: restart dump_ip_reputation.service on puppetserver1001
  • 08:13 elukey: restart tomcat on idp[1,2]003 to pick up the new openjdk
  • 08:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T367856)', diff saved to https://phabricator.wikimedia.org/P67252 and previous config saved to /var/cache/conftool/dbconfig/20240808-081041-marostegui.json
  • 08:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 08:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 08:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T367856)', diff saved to https://phabricator.wikimedia.org/P67251 and previous config saved to /var/cache/conftool/dbconfig/20240808-081019-marostegui.json
  • 08:09 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Openjdk upgrade - elukey@cumin1002
  • 07:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P67250 and previous config saved to /var/cache/conftool/dbconfig/20240808-075512-marostegui.json
  • 07:42 logmsgbot: @deploy1003 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:42 logmsgbot: @deploy1003 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P67249 and previous config saved to /var/cache/conftool/dbconfig/20240808-074005-marostegui.json
  • 07:32 dcausse: T371401: reindexing testwikidatawiki to index mul labels
  • 07:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T367856)', diff saved to https://phabricator.wikimedia.org/P67248 and previous config saved to /var/cache/conftool/dbconfig/20240808-072458-marostegui.json
  • 07:19 hashar: Restarted CI Jenkins for upgrade and plugin update # T371976
  • 07:11 dcausse@deploy1003: Finished scap: Backport for search: index stems for mul labels (T371401) (duration: 09m 03s)
  • 07:06 dcausse@deploy1003: dcausse: Continuing with sync
  • 07:04 dcausse@deploy1003: dcausse: Backport for search: index stems for mul labels (T371401) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:02 dcausse@deploy1003: Started scap sync-world: Backport for search: index stems for mul labels (T371401)
  • 06:57 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 06:57 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 06:57 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 06:57 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 06:51 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 06:51 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 06:51 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 06:51 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 06:51 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 06:51 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 06:51 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 06:51 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 06:51 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 06:51 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 06:51 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 06:51 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 06:51 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 06:50 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 06:50 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 06:50 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 06:50 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 06:50 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 06:50 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 06:50 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 06:49 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 06:48 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 06:48 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 06:48 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 06:47 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 06:43 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 06:42 hashar: restarting Gerrit
  • 06:42 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 06:41 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 06:41 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 06:41 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 06:41 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 06:35 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 02:19 cstone: civicrm upgraded from d1f1d7bd to 686c7c5f
  • 00:30 rzl@deploy1003: Finished scap: https://gerrit.wikimedia.org/r/1060184 (duration: 02m 33s)
  • 00:29 rzl@deploy1003: Started scap sync-world: https://gerrit.wikimedia.org/r/1060184

2024-08-07

  • 21:23 cstone: payments-wiki upgraded from 88500664 to a7f3301a
  • 21:19 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host vrts1003.eqiad.wmnet with OS bookworm
  • 21:03 cstone: payments-wiki upgraded from 49a9e765 to 88500664
  • 21:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts1003.eqiad.wmnet with reason: host reimage
  • 20:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts1003.eqiad.wmnet with reason: host reimage
  • 20:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gerrit1004.wikimedia.org with OS bookworm
  • 20:53 milimetric@deploy1003: Finished deploy [airflow-dags/analytics@4cf9922]: (no justification provided) (duration: 00m 38s)
  • 20:53 milimetric@deploy1003: Started deploy [airflow-dags/analytics@4cf9922]: (no justification provided)
  • 20:39 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit1004.wikimedia.org with reason: host reimage
  • 20:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host vrts1003.eqiad.wmnet with OS bookworm
  • 20:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host vrts1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit1004.wikimedia.org with reason: host reimage
  • 20:21 cjming: end of UTC late backport window
  • 20:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host gerrit1004.wikimedia.org with OS bookworm
  • 20:15 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gerrit1004.wikimedia.org with OS bookworm
  • 20:11 milimetric@deploy1003: Finished deploy [airflow-dags/analytics@049c09e]: (no justification provided) (duration: 00m 03s)
  • 20:11 milimetric@deploy1003: Started deploy [airflow-dags/analytics@049c09e]: (no justification provided)
  • 20:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host vrts1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:04 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt vrts1003 - jclark@cumin1002"
  • 20:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt vrts1003 - jclark@cumin1002"
  • 20:01 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 19:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit1004.wikimedia.org with reason: host reimage
  • 19:55 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit1004.wikimedia.org with reason: host reimage
  • 19:53 milimetric@deploy1003: Finished deploy [airflow-dags/analytics@049c09e]: (no justification provided) (duration: 00m 59s)
  • 19:52 milimetric@deploy1003: Started deploy [airflow-dags/analytics@049c09e]: (no justification provided)
  • 19:52 milimetric@deploy1003: Finished deploy [airflow-dags/analytics@216348d]: (no justification provided) (duration: 00m 47s)
  • 19:51 milimetric@deploy1003: Started deploy [airflow-dags/analytics@216348d]: (no justification provided)
  • 19:47 milimetric@deploy1003: Finished deploy [airflow-dags/analytics@049c09e]: Deploying new Browser General job (duration: 00m 02s)
  • 19:47 milimetric@deploy1003: Started deploy [airflow-dags/analytics@049c09e]: Deploying new Browser General job
  • 19:46 milimetric@deploy1003: Finished deploy [airflow-dags/analytics@049c09e]: Deploying new Browser General job (duration: 00m 41s)
  • 19:45 milimetric@deploy1003: Started deploy [airflow-dags/analytics@049c09e]: Deploying new Browser General job
  • 19:39 ebernhardson@deploy1003: Finished deploy [airflow-dags/search@049c09e]: workaround process_sparql_query oom issues (duration: 00m 20s)
  • 19:39 ebernhardson@deploy1003: Started deploy [airflow-dags/search@049c09e]: workaround process_sparql_query oom issues
  • 19:38 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host gerrit1004.wikimedia.org with OS bookworm
  • 19:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host gerrit1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:33 brett: start pybal on lvs1017
  • 19:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1017.eqiad.wmnet
  • 19:29 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs1017.eqiad.wmnet
  • 19:18 brennen@deploy1003: Finished scap: Backport for Fix TypeError in PendingChanges by handling null subPage (T371986) (duration: 08m 23s)
  • 19:14 brennen@deploy1003: brennen: Continuing with sync
  • 19:12 brennen@deploy1003: brennen: Backport for Fix TypeError in PendingChanges by handling null subPage (T371986) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host gerrit1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:11 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:11 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt gerrit1004 - jclark@cumin1002"
  • 19:11 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt gerrit1004 - jclark@cumin1002"
  • 19:10 brennen@deploy1003: Started scap sync-world: Backport for Fix TypeError in PendingChanges by handling null subPage (T371986)
  • 19:08 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 19:04 brett: stop pybal on lvs1017 for server reboot
  • 19:00 brett: start pybal on lvs1018
  • 18:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1018.eqiad.wmnet
  • 18:56 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs1018.eqiad.wmnet
  • 18:45 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1296.eqiad.wmnet with OS bullseye
  • 18:40 brett: stop pybal on lvs1018 for server reboot
  • 18:39 milimetric@deploy1003: Finished deploy [analytics/refinery@fe20690]: Syncing browser general script hive version (duration: 16m 05s)
  • 18:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1296.eqiad.wmnet with reason: host reimage
  • 18:33 brett: start pybal on lvs1019
  • 18:32 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1296.eqiad.wmnet with reason: host reimage
  • 18:32 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1038.eqiad.wmnet with OS bullseye
  • 18:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1296.eqiad.wmnet with OS bullseye
  • 18:28 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1296.eqiad.wmnet with OS bullseye
  • 18:22 milimetric@deploy1003: Started deploy [analytics/refinery@fe20690]: Syncing browser general script hive version
  • 18:20 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1019.eqiad.wmnet
  • 18:17 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs1019.eqiad.wmnet
  • 18:14 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
  • 18:12 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
  • 18:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1296.eqiad.wmnet with reason: host reimage
  • 18:09 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1296.eqiad.wmnet with reason: host reimage
  • 17:54 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1038.eqiad.wmnet with OS bullseye
  • 17:41 sukhe: running authdns-update for Yahoo CFL TXT record: T370963
  • 17:35 brennen@deploy1003: Finished scap: Backport for Revert "Drop writeapi flag from siteinfo API" (T115414 T294397 T371977) (duration: 08m 06s)
  • 17:34 milimetric@deploy1003: Finished deploy [analytics/refinery@0d25645] (thin): Syncing browser general script, and refinery-source 0.2.45 apparently (duration: 04m 21s)
  • 17:31 brennen@deploy1003: brennen, bd808: Continuing with sync
  • 17:30 milimetric@deploy1003: Started deploy [analytics/refinery@0d25645] (thin): Syncing browser general script, and refinery-source 0.2.45 apparently
  • 17:29 brennen@deploy1003: brennen, bd808: Backport for Revert "Drop writeapi flag from siteinfo API" (T115414 T294397 T371977) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:29 milimetric@deploy1003: Finished deploy [analytics/refinery@0d25645]: Syncing browser general script, and refinery-source 0.2.45 apparently (duration: 54m 21s)
  • 17:27 brennen@deploy1003: Started scap sync-world: Backport for Revert "Drop writeapi flag from siteinfo API" (T115414 T294397 T371977)
  • 17:17 brett: stop pybal on lvs1019 for server reboot
  • 17:14 brett: start pybal on lvs2014
  • 17:11 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2014.codfw.wmnet
  • 17:08 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs2014.codfw.wmnet
  • 17:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1296.eqiad.wmnet with OS bullseye
  • 16:42 brett: stop pybal on lvs2014 for server reboot
  • 16:37 mutante: puppetserver1002 systemctl start dump_ip_reputation
  • 16:34 milimetric@deploy1003: Started deploy [analytics/refinery@0d25645]: Syncing browser general script, and refinery-source 0.2.45 apparently
  • 16:27 brett: start pybal on lvs2013
  • 16:15 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1038.eqiad.wmnet with OS bullseye
  • 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P67246 and previous config saved to /var/cache/conftool/dbconfig/20240807-161452-ladsgroup.json
  • 16:11 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2013.codfw.wmnet
  • 16:08 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs2013.codfw.wmnet
  • 16:01 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Openjdk upgrade - elukey@cumin1002
  • 15:57 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
  • 15:54 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
  • 15:40 brett: stop pybal on lvs2013 for server reboot
  • 08:18 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.17 refs T366962
  • 07:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add role to mgmt devices - ayounsi@cumin1002"
  • 07:43 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add role to mgmt devices - ayounsi@cumin1002"
  • 03:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1296.eqiad.wmnet with OS bullseye
  • 02:21 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1296.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:21 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1296.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:06 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1285.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:04 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1285.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1288.eqiad.wmnet with OS bullseye
  • 02:02 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 02:02 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1294.eqiad.wmnet with OS bullseye
  • 01:57 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:56 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1289.eqiad.wmnet with OS bullseye
  • 01:55 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:53 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1290.eqiad.wmnet with OS bullseye
  • 01:52 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:51 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1288.eqiad.wmnet with reason: host reimage
  • 01:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1287.eqiad.wmnet with OS bullseye
  • 01:44 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:43 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1296.eqiad.wmnet with OS bullseye
  • 01:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1292.eqiad.wmnet with OS bullseye
  • 01:42 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1293.eqiad.wmnet with OS bullseye
  • 01:40 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:39 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:39 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1294.eqiad.wmnet with reason: host reimage
  • 01:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1291.eqiad.wmnet with OS bullseye
  • 01:36 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1289.eqiad.wmnet with reason: host reimage
  • 01:35 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1295.eqiad.wmnet with OS bullseye
  • 01:33 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1290.eqiad.wmnet with reason: host reimage
  • 01:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:26 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1287.eqiad.wmnet with reason: host reimage
  • 01:24 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1292.eqiad.wmnet with reason: host reimage
  • 01:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1293.eqiad.wmnet with reason: host reimage
  • 01:19 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1286.eqiad.wmnet with OS bullseye
  • 01:19 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1285.eqiad.wmnet with OS bullseye
  • 01:18 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1291.eqiad.wmnet with reason: host reimage
  • 01:15 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1287.eqiad.wmnet with reason: host reimage
  • 01:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1295.eqiad.wmnet with reason: host reimage
  • 01:14 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1294.eqiad.wmnet with reason: host reimage
  • 01:14 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1293.eqiad.wmnet with reason: host reimage
  • 01:13 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1292.eqiad.wmnet with reason: host reimage
  • 01:13 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1291.eqiad.wmnet with reason: host reimage
  • 01:13 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1290.eqiad.wmnet with reason: host reimage
  • 01:12 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1289.eqiad.wmnet with reason: host reimage
  • 01:11 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1288.eqiad.wmnet with reason: host reimage
  • 01:11 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1295.eqiad.wmnet with reason: host reimage
  • 01:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1285.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:02 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1285.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1286.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1286.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:58 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1287.eqiad.wmnet with OS bullseye
  • 00:58 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1294.eqiad.wmnet with OS bullseye
  • 00:57 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1293.eqiad.wmnet with OS bullseye
  • 00:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1292.eqiad.wmnet with OS bullseye
  • 00:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1291.eqiad.wmnet with OS bullseye
  • 00:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1290.eqiad.wmnet with OS bullseye
  • 00:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1289.eqiad.wmnet with OS bullseye
  • 00:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1288.eqiad.wmnet with OS bullseye
  • 00:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1295.eqiad.wmnet with OS bullseye
  • 00:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1284.eqiad.wmnet with OS bullseye
  • 00:50 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:48 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1280.eqiad.wmnet with OS bullseye
  • 00:46 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:45 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1281.eqiad.wmnet with OS bullseye
  • 00:44 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:43 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1282.eqiad.wmnet with OS bullseye
  • 00:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:39 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1279.eqiad.wmnet with OS bullseye
  • 00:39 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:38 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1283.eqiad.wmnet with OS bullseye
  • 00:37 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker1284.eqiad.wmnet with reason: host reimage
  • 00:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1280.eqiad.wmnet with reason: host reimage
  • 00:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker1281.eqiad.wmnet with reason: host reimage
  • 00:20 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker1282.eqiad.wmnet with reason: host reimage
  • 00:10 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker1283.eqiad.wmnet with reason: host reimage
  • 00:00 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1279.eqiad.wmnet with reason: host reimage

2024-08-06

  • 23:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1284.eqiad.wmnet with reason: host reimage
  • 23:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1283.eqiad.wmnet with reason: host reimage
  • 23:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1281.eqiad.wmnet with reason: host reimage
  • 23:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1282.eqiad.wmnet with reason: host reimage
  • 23:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1280.eqiad.wmnet with reason: host reimage
  • 23:56 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1279.eqiad.wmnet with reason: host reimage
  • 23:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1274.eqiad.wmnet with OS bullseye
  • 23:49 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1286.eqiad.wmnet with OS bullseye
  • 23:49 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1285.eqiad.wmnet with OS bullseye
  • 23:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1271.eqiad.wmnet with OS bullseye
  • 23:46 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:44 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1272.eqiad.wmnet with OS bullseye
  • 23:43 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:41 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1283.eqiad.wmnet with OS bullseye
  • 23:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1284.eqiad.wmnet with OS bullseye
  • 23:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1282.eqiad.wmnet with OS bullseye
  • 23:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1281.eqiad.wmnet with OS bullseye
  • 23:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1280.eqiad.wmnet with OS bullseye
  • 23:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1279.eqiad.wmnet with OS bullseye
  • 23:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1276.eqiad.wmnet with OS bullseye
  • 23:38 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:37 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1273.eqiad.wmnet with OS bullseye
  • 23:34 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:34 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1277.eqiad.wmnet with OS bullseye
  • 23:33 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1278.eqiad.wmnet with OS bullseye
  • 23:33 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1274.eqiad.wmnet with reason: host reimage
  • 23:28 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1271.eqiad.wmnet with reason: host reimage
  • 23:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1275.eqiad.wmnet with OS bullseye
  • 23:27 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:24 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:24 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1272.eqiad.wmnet with reason: host reimage
  • 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1270.eqiad.wmnet with OS bullseye
  • 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:22 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1276.eqiad.wmnet with reason: host reimage
  • 23:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1271.eqiad.wmnet with reason: host reimage
  • 23:17 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1277.eqiad.wmnet with reason: host reimage
  • 23:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1273.eqiad.wmnet with reason: host reimage
  • 23:10 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1278.eqiad.wmnet with reason: host reimage
  • 23:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1275.eqiad.wmnet with reason: host reimage
  • 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1270.eqiad.wmnet with reason: host reimage
  • 23:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1278.eqiad.wmnet with reason: host reimage
  • 23:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1277.eqiad.wmnet with reason: host reimage
  • 23:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1276.eqiad.wmnet with reason: host reimage
  • 23:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1275.eqiad.wmnet with reason: host reimage
  • 23:03 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1274.eqiad.wmnet with reason: host reimage
  • 23:03 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1273.eqiad.wmnet with reason: host reimage
  • 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1272.eqiad.wmnet with reason: host reimage
  • 23:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1271.eqiad.wmnet with OS bullseye
  • 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1270.eqiad.wmnet with reason: host reimage
  • 23:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1271.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:48 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1278.eqiad.wmnet with OS bullseye
  • 22:47 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1277.eqiad.wmnet with OS bullseye
  • 22:47 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1276.eqiad.wmnet with OS bullseye
  • 22:47 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1275.eqiad.wmnet with OS bullseye
  • 22:47 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1274.eqiad.wmnet with OS bullseye
  • 22:46 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1273.eqiad.wmnet with OS bullseye
  • 22:46 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1272.eqiad.wmnet with OS bullseye
  • 22:45 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1270.eqiad.wmnet with OS bullseye
  • 22:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1271.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:41 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1271 - jclark@cumin1002"
  • 22:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1271 - jclark@cumin1002"
  • 22:38 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 22:34 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:32 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:35 kindrobot: UTC late backport window finished <3
  • 21:34 kindrobot@deploy1003: Finished scap: Backport for Promote dark mode for anons on various wikis - take 2 (T371070 T371084), Enable NetworkSession extension for most wikis (T355267), fix(i18n): adjust broken mentorship eligibility copy (T371775 T370318), fix(i18n): adjust broken mentorship eligibility copy (T371775 T370318) (duration: 47m 05s)
  • 21:25 kindrobot@deploy1003: toyofuku, ebernhardson, kindrobot, migr: Continuing with sync
  • {{safesubst:SAL entry|1=21:21 kindrobot@deploy1003: toyofuku, ebernhardson, kindrobot, migr: Backport for Promote dark mode for anons on various wikis - take 2 (T371070 T371084), Enable NetworkSession extension for most wikis (T355267), fix(i18n): adjust broken mentorship eligibility copy (T371775 T370318), [[gerrit:1060136|fix(i18n): adjust broken mentorship eligibility copy (T371775 T37031}}
  • 21:21 brett: start pybal on lvs6002
  • 21:18 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6002.drmrs.wmnet
  • 21:16 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs6002.drmrs.wmnet
  • 20:59 brett: stop pybal on lvs6002 for server reboot
  • 20:56 kindrobot: UTC late backport window, deploy is extending beyond deployment window
  • 20:47 kindrobot@deploy1003: Started scap sync-world: Backport for Promote dark mode for anons on various wikis - take 2 (T371070 T371084), Enable NetworkSession extension for most wikis (T355267), fix(i18n): adjust broken mentorship eligibility copy (T371775 T370318), fix(i18n): adjust broken mentorship eligibility copy (T371775 T370318)
  • 20:27 brett: start pybal on lvs4009
  • 20:26 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240729/ using stat1009.eqiad.wmnet)
  • 20:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4009.ulsfo.wmnet
  • 20:21 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs4009.ulsfo.wmnet
  • 19:57 brett: stop pybal on lvs4009 for server reboot
  • 19:49 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240729/ using stat1009.eqiad.wmnet)
  • 19:49 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 02s)
  • 19:49 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
  • 19:44 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 19:44 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 19:42 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 16s)
  • 19:42 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
  • 19:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 18s)
  • 19:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
  • 19:35 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 09s)
  • 19:35 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
  • 19:21 sukhe: start pybal on lvs4008
  • 19:19 sukhe: restart varnishmtail on cp3070
  • 19:16 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4008.ulsfo.wmnet
  • 19:13 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4008.ulsfo.wmnet
  • 19:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve2011
  • 19:03 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve2011
  • 19:03 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve2010
  • 19:03 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve2010
  • 19:03 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve2009
  • 19:03 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve2009
  • 18:57 sukhe: sudo cumin "lvs4008*" 'disable-puppet "rebooting" && systemctl stop pybal.service'
  • 18:49 dancy@deploy1003: Finished scap: testing T371904 (duration: 04m 14s)
  • 18:48 sukhe: re-enable pybal on lvs6001
  • 18:47 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6001.drmrs.wmnet
  • 18:45 dancy@deploy1003: Started scap sync-world: testing T371904
  • 18:44 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs6001.drmrs.wmnet
  • 18:44 dancy@deploy1003: Finished scap: testing T370934 (duration: 31m 05s)
  • 18:41 brett: start pybal on lvs5005
  • 18:36 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5005.eqsin.wmnet
  • 18:33 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs5005.eqsin.wmnet
  • 18:28 sukhe: sudo cumin "lvs6001*" 'disable-puppet "rebooting" && systemctl stop pybal.service'
  • 18:18 brett: stop pybal on lvs5005 for server reboot
  • 18:13 dancy@deploy1003: Started scap sync-world: testing T370934
  • 17:53 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5004.eqsin.wmnet
  • 17:51 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs5004.eqsin.wmnet
  • 17:47 sukhe: stop pybal on lvs5004 for server reboot
  • 17:40 mutante: CI - adding a new SSH key to jenkins - in the same file without removing the old key yet - this is expected to have no effect, but if CI breaks will revert - T177826
  • 17:01 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s5
  • 17:01 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s8
  • 16:56 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1020.eqiad.wmnet with OS bookworm
  • 16:44 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1023.eqiad.wmnet with OS bullseye
  • 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding payments200 to codfw - jhancock@cumin2002"
  • 16:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding payments200 to codfw - jhancock@cumin2002"
  • 16:35 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:23 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1020.eqiad.wmnet with reason: host reimage
  • 16:21 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1020.eqiad.wmnet with reason: host reimage
  • 16:08 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1020.eqiad.wmnet with OS bookworm
  • 16:08 sukhe: sudo cumin "A:dnsbox" "run-puppet-agent --enable 'upgrading anycast-hc'": finish anycast-hc upgrade: T370068
  • 16:08 sukhe: sudo cumin "A:dnsbox" "run-puppet-agent --enable 'upgrading anycast-hc'": finish anycast-hc upgrade
  • 16:03 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1020.eqiad.wmnet with reason: Reimaging clouddb1020 T365424
  • 16:03 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb1020.eqiad.wmnet with reason: Reimaging clouddb1020 T365424
  • 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-serve2011 to codfw - jhancock@cumin2002"
  • 15:46 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-serve2011 to codfw - jhancock@cumin2002"
  • 15:41 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:39 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-serve2010 to codfw - jhancock@cumin2002"
  • 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-serve2010 to codfw - jhancock@cumin2002"
  • 15:35 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:30 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:30 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:26 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:26 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:25 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns1006.wikimedia.org [reason: [done] anycast-healthchecker 0.9.8 upgrade]
  • 15:25 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2035.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:23 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1023.eqiad.wmnet with OS bullseye
  • 15:23 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2035.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:23 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns1006.wikimedia.org [reason: anycast-healthchecker 0.9.8 upgrade]
  • 15:21 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns1005.wikimedia.org [reason: [done] anycast-healthchecker 0.9.8 upgrade]
  • 15:20 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:18 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns1005.wikimedia.org [reason: anycast-healthchecker 0.9.8 upgrade]
  • 15:16 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns1004.wikimedia.org [reason: [done] anycast-healthchecker 0.9.8 upgrade]
  • 15:14 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns1004.wikimedia.org [reason: anycast-healthchecker 0.9.8 upgrade]
  • 15:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:11 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 15:10 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 15:10 cdanis: re-enabling puppet on cp nodes to deploy https://gerrit.wikimedia.org/r/1059126
  • 15:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1296.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:01 cdanis: disabling puppet on cp nodes to deploy https://gerrit.wikimedia.org/r/1059126
  • 14:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1295.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1294.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1293.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1291.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1292.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1290.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1289.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1288.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1286.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1287.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:56 sukhe: disable puppet on A:dnsbox for cluster-wide anycast-hc 0.9.8 upgrade on remaining hosts: T370068
  • 14:55 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns7002.wikimedia.org [reason: [done] anycast-healthchecker 0.9.8 upgrade]
  • 14:53 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns7002.wikimedia.org [reason: anycast-healthchecker 0.9.8 upgrade]
  • 14:53 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns7002.wikimedia.org,service=recdns [reason: anycast-healthchecker 0.9.8 upgrade]
  • 14:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1296.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1273.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1270.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1272.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1274.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1275.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:41 _joe_: repool cp4044
  • 14:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1295.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1294.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1293.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1292.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1290.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1289.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1288.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1286.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:37 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1291.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:37 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1287.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:37 zabe@deploy1003: Finished scap: update interwiki cache (duration: 07m 10s)
  • 14:36 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp3081*} and A:cp for 9.2.5-1wm2
  • 14:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1285.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1278.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1282.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1280.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1284.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1283.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1281.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1279.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1277.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1276.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:33 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp3081*} and A:cp for 9.2.5-1wm2
  • 14:30 zabe@deploy1003: Started scap sync-world: update interwiki cache
  • 14:29 ChrisDobbins901_: cdobbins@cumin1002:~$ sudo cumin 'A:cp' 'run-puppet-agent --enable "merging CR #1059123"'
  • 14:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1020.eqiad.wmnet with reason: Reimaging clouddb1020 T365424
  • 14:28 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb1020.eqiad.wmnet with reason: Reimaging clouddb1020 T365424
  • 14:25 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=bdrwiki --cluster=all 2>&1 | tee /tmp/bdrwiki.UpdateSearchIndexConfig.log # T371757
  • 14:24 zabe@deploy1003: Finished scap: Creating bdrwiki (T371757) (duration: 06m 43s)
  • 14:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host vrts2002.codfw.wmnet with OS bookworm
  • 14:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 14:23 elukey: upgrade debmonitor-server on debmonitor[1,2]003 to version 0.5 - T368744
  • 14:21 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 14:21 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1273.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:21 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1273.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:20 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1272.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:20 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1273.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1274.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:18 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1270.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:18 zabe@deploy1003: Started scap sync-world: Creating bdrwiki (T371757)
  • 14:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1270.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1274.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1273.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1272.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:17 zabe: Create Wikipedia West Coast Bajau # T371757
  • 14:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1270.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1275.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1274.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1273.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1272.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:14 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-serve2009 to codfw - jhancock@cumin2002"
  • 14:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-serve2009 to codfw - jhancock@cumin2002"
  • 14:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1285.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:12 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1285.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:12 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1278.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:12 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1283.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1280.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1277.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1284.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1285.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1282.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1281.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:11 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1279.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1276.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:08 zabe@deploy1003: Finished scap: Backport for TranslatablePage: Use local cache to reduce calls to the WAN cache (T366455), Fix test that only works in June or July (T371577), TranslatablePage: Use local cache to reduce calls to the WAN cache (T366455) (duration: 13m 22s)
  • 14:07 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:07 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker12 - jclark@cumin1002"
  • 14:07 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker12 - jclark@cumin1002"
  • 14:03 zabe@deploy1003: abi, zabe: Continuing with sync
  • 14:03 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 14:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts2002.codfw.wmnet with reason: host reimage
  • 14:02 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:01 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s8
  • 14:01 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s5
  • 14:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts2002.codfw.wmnet with reason: host reimage
  • 14:00 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 13:56 zabe@deploy1003: abi, zabe: Backport for TranslatablePage: Use local cache to reduce calls to the WAN cache (T366455), Fix test that only works in June or July (T371577), TranslatablePage: Use local cache to reduce calls to the WAN cache (T366455) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:54 zabe@deploy1003: Started scap sync-world: Backport for TranslatablePage: Use local cache to reduce calls to the WAN cache (T366455), Fix test that only works in June or July (T371577), TranslatablePage: Use local cache to reduce calls to the WAN cache (T366455)
  • 13:54 sukhe: upgrading A:wikidough to pdns-rec 4.8.8
  • 13:53 ChrisDobbins901_: cdobbins@cumin1002:~$ sudo cumin 'A:cp' 'disable-puppet "merging CR #1059123"'
  • 13:51 zabe@deploy1003: Finished scap: T371060 (duration: 07m 57s)
  • 13:43 zabe@deploy1003: Started scap sync-world: T371060
  • 13:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host vrts2002.codfw.wmnet with OS bookworm
  • 13:28 zabe@deploy1003: Finished scap: Backport for mywikisource: add portal, author and translation namespaces (T371060), dtpwiki: add timezone (T371076) (duration: 11m 28s)
  • 13:24 zabe@deploy1003: anzx, zabe: Continuing with sync
  • 13:20 zabe@deploy1003: anzx, zabe: Backport for mywikisource: add portal, author and translation namespaces (T371060), dtpwiki: add timezone (T371076) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:17 zabe@deploy1003: Started scap sync-world: Backport for mywikisource: add portal, author and translation namespaces (T371060), dtpwiki: add timezone (T371076)
  • 13:14 zabe@deploy1003: Finished scap: Backport for group0, frwiki, itwiki: enable shellbox-video (T356241), [Growth] enwiki: Enable frontend for Add Link (T370802) (duration: 10m 41s)
  • 13:13 _joe_: depooling cp4044 from traffic to apply new tls termination templates
  • 13:09 zabe@deploy1003: hnowlan, urbanecm, zabe: Continuing with sync
  • 13:08 zabe@deploy1003: hnowlan, urbanecm, zabe: Backport for group0, frwiki, itwiki: enable shellbox-video (T356241), [Growth] enwiki: Enable frontend for Add Link (T370802) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:03 zabe@deploy1003: Started scap sync-world: Backport for group0, frwiki, itwiki: enable shellbox-video (T356241), [Growth] enwiki: Enable frontend for Add Link (T370802)
  • 12:58 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:58 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:39 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:39 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:39 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Openjdk upgrade - elukey@cumin1002
  • 12:32 elukey: apt-get purge debmonitor-server + run-puppet-agent to re-install the daemon on debmonitor2003
  • 12:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on debmonitor2003.codfw.wmnet with reason: failover test
  • 12:31 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on debmonitor2003.codfw.wmnet with reason: failover test
  • 12:21 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Openjdk upgrade - elukey@cumin1002
  • 12:16 elukey: restart debmonitor-server on debmonitor1003
  • 12:13 elukey: stop debmonitor-server on debmonitor1003 as temporary test
  • 12:11 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on debmonitor1003.eqiad.wmnet with reason: failover test
  • 12:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on debmonitor1003.eqiad.wmnet with reason: failover test
  • 10:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T367856)', diff saved to https://phabricator.wikimedia.org/P67232 and previous config saved to /var/cache/conftool/dbconfig/20240806-100756-marostegui.json
  • 10:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 10:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 10:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T367856)', diff saved to https://phabricator.wikimedia.org/P67231 and previous config saved to /var/cache/conftool/dbconfig/20240806-100734-marostegui.json
  • 09:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P67229 and previous config saved to /var/cache/conftool/dbconfig/20240806-095226-marostegui.json
  • 09:41 joe: upgrading conftool to 3.2.1 everywhere T369606
  • 09:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P67228 and previous config saved to /var/cache/conftool/dbconfig/20240806-093719-marostegui.json
  • 09:24 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Openjdk upgrade - elukey@cumin1002
  • 09:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T367856)', diff saved to https://phabricator.wikimedia.org/P67227 and previous config saved to /var/cache/conftool/dbconfig/20240806-092212-marostegui.json
  • 09:07 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Openjdk upgrade - elukey@cumin1002
  • 09:02 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Openjdk upgrade - elukey@cumin1002
  • 08:43 topranks: shutting cloudsw1-d5-eqiad <-> cloudsw1-e4-eqiad link
  • 08:42 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Openjdk upgrade - elukey@cumin1002
  • 08:16 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.17 refs T366962
  • 08:16 elukey: powercycle wdqs1023, misbehaving and not responding to ssh anymore
  • 08:12 elukey@puppetserver1001: conftool action : set/pooled=no; selector: name=wdqs1023.eqiad.wmnet
  • 07:50 kart_: Updated cxserver to 2024-08-05-063332-production (T371760, T357950)
  • 07:49 oblivian@puppetserver1002: conftool action : set/weight=10; selector: cluster=videoscaler,name=mw1407.eqiad.wmnet
  • 07:49 oblivian@puppetserver1002: conftool action : set/weight=1; selector: cluster=videoscaler,name=mw1407.eqiad.wmnet
  • 07:46 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 07:45 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 07:44 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 07:44 _joe_: uploaded conftool 3.2.1 to apt.wikimedia.org
  • 07:43 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 07:42 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 07:42 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 07:40 kart_: Updated MinT to 2024-08-05-062247-production (T363308, T355304, T368521)
  • 07:37 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 07:34 elukey: powercycle ml-serve2001 - host seems frozen, DIMM errors registered in `getsel`
  • 07:28 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 07:16 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 142108
  • 07:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262725
  • 07:15 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 262725
  • 07:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61928
  • 07:15 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61928
  • 07:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 265158
  • 07:14 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 265158
  • 07:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 264014
  • 07:14 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 264014
  • 07:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61642
  • 07:13 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61642
  • 07:13 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 07:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15169
  • 07:05 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 07:03 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 06:58 kartik@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 06:50 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 15169
  • 05:41 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240729/ using stat1009.eqiad.wmnet)
  • 04:39 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240729/ using stat1009.eqiad.wmnet)
  • 04:38 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1021.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240729/ using stat1009.eqiad.wmnet)
  • 04:37 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 09s)
  • 04:37 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
  • 04:36 ryankemper@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1023.eqiad.wmnet with OS bullseye
  • 04:36 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 09s)
  • 04:36 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
  • 04:01 mwpresync@deploy1003: Pruned MediaWiki: 1.43.0-wmf.14 (duration: 00m 58s)
  • 03:47 mwpresync@deploy1003: Finished scap: testwikis to 1.43.0-wmf.17 refs T366962 (duration: 45m 05s)
  • 03:02 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.43.0-wmf.17 refs T366962

2024-08-05

  • 20:47 cjming: end of UTC late backport window
  • 20:44 cjming@deploy1003: Finished scap: Backport for Add wikibase client interaction stream to Event Logging (T370045) (duration: 22m 52s)
  • 20:39 cjming@deploy1003: cjming, joelyrookewmde: Continuing with sync
  • 20:23 cjming@deploy1003: cjming, joelyrookewmde: Backport for Add wikibase client interaction stream to Event Logging (T370045) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:21 cjming@deploy1003: Started scap sync-world: Backport for Add wikibase client interaction stream to Event Logging (T370045)
  • 19:29 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_main on wdqs1021.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240729/ using stat1009.eqiad.wmnet)
  • 19:14 otto@deploy1003: Finished scap: Backport for eventbus: enable instrumentation on all wikis (T363587) (duration: 07m 08s)
  • 19:10 otto@deploy1003: otto: Continuing with sync
  • 19:09 otto@deploy1003: otto: Backport for eventbus: enable instrumentation on all wikis (T363587) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:07 otto@deploy1003: Started scap sync-world: Backport for eventbus: enable instrumentation on all wikis (T363587)
  • 18:56 dancy@deploy1003: sync-world aborted: testing scap 4.96.0 (duration: 03m 11s)
  • 18:53 dancy@deploy1003: Started scap sync-world: testing scap 4.96.0
  • 18:52 dancy@deploy1003: Installation of scap version "4.96.0" completed for 211 hosts
  • 18:52 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1021.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240729/ using stat1009.eqiad.wmnet)
  • 18:52 dancy@deploy1003: Installing scap version "4.96.0" for 211 hosts
  • 18:27 dancy@deploy1003: Started scap sync-world: testing updates to repos/releng/release/make-container-image
  • 17:28 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1023.eqiad.wmnet with reason: host reimage
  • 17:25 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1023.eqiad.wmnet with reason: host reimage
  • 17:04 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1023.eqiad.wmnet with OS bullseye
  • 16:52 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:52 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:52 mutante: DNS - added new project language 'bdr' - West Coast Bajau - https://en.wikipedia.org/wiki/Sama%E2%80%93Bajaw_languages - T371757
  • 16:36 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2008.codfw.wmnet with OS bookworm
  • 16:33 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2007.codfw.wmnet with OS bookworm
  • 16:20 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:19 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:18 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus2008.codfw.wmnet with reason: host reimage
  • 16:15 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
  • 16:12 filippo@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus2008.codfw.wmnet with reason: host reimage
  • 16:11 filippo@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
  • 16:11 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:10 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:53 filippo@cumin1002: START - Cookbook sre.hosts.reimage for host prometheus2008.codfw.wmnet with OS bookworm
  • 15:42 filippo@cumin1002: START - Cookbook sre.hosts.reimage for host prometheus2007.codfw.wmnet with OS bookworm
  • 15:41 filippo@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host prometheus2007.codfw.wmnet with OS bookworm
  • 15:40 filippo@cumin1002: START - Cookbook sre.hosts.reimage for host prometheus2007.codfw.wmnet with OS bookworm
  • 15:39 filippo@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host prometheus2007.codfw.wmnet with OS bookworm
  • 15:39 elukey@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 15:38 elukey@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 15:29 filippo@cumin1002: START - Cookbook sre.hosts.reimage for host prometheus2007.codfw.wmnet with OS bookworm
  • 15:27 elukey@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 15:26 elukey@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 15:22 elukey@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 15:22 elukey@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 15:16 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1019.eqiad.wmnet with OS bookworm
  • 15:15 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s4
  • 15:15 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s6
  • 15:07 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2239.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:03 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2239.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2240.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:52 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 14:52 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 14:49 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2240.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2238.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:43 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1006.eqiad.wmnet
  • 14:36 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 14:35 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 14:35 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2238.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:35 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1006.eqiad.wmnet
  • 14:25 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 14:25 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 14:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:22 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:20 filippo@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host prometheus2007.codfw.wmnet with OS bookworm
  • 14:18 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1003.eqiad.wmnet
  • 14:11 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1003.eqiad.wmnet
  • 14:04 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2237.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:02 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1019.eqiad.wmnet with reason: host reimage
  • 14:01 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2237.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:01 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2236.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 13:59 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1019.eqiad.wmnet with reason: host reimage
  • 13:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2236.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 13:44 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1019.eqiad.wmnet with OS bookworm
  • 13:39 filippo@cumin1002: START - Cookbook sre.hosts.reimage for host prometheus2007.codfw.wmnet with OS bookworm
  • 13:20 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-conf1006.eqiad.wmnet with OS bookworm
  • 13:07 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-conf1006.eqiad.wmnet with reason: host reimage
  • 13:04 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1007.eqiad.wmnet
  • 13:03 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-conf1006.eqiad.wmnet with reason: host reimage
  • 13:00 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-tool1007.eqiad.wmnet
  • 12:58 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1008.eqiad.wmnet
  • 12:57 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-web1001.eqiad.wmnet
  • 12:57 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1011.eqiad.wmnet
  • 12:55 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-tool1008.eqiad.wmnet
  • 12:53 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-tool1011.eqiad.wmnet
  • 12:52 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1006.eqiad.wmnet with OS bookworm
  • 12:52 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-web1001.eqiad.wmnet
  • 12:11 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-conf1005.eqiad.wmnet with OS bookworm
  • 11:56 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-conf1005.eqiad.wmnet with reason: host reimage
  • 11:53 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-conf1005.eqiad.wmnet with reason: host reimage
  • 11:42 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 11:42 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1005.eqiad.wmnet with OS bookworm
  • 11:34 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 11:33 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1005.eqiad.wmnet
  • 11:27 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1005.eqiad.wmnet
  • 11:22 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1004.eqiad.wmnet
  • 11:20 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1001.eqiad.wmnet
  • 11:18 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary and A:netbox-all
  • 11:17 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-conf1004.eqiad.wmnet with OS bookworm
  • 11:16 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1004.eqiad.wmnet
  • 11:12 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1016.eqiad.wmnet
  • 11:12 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary and A:netbox-all
  • 11:11 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-coord1001.eqiad.wmnet
  • 11:10 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-client1002.eqiad.wmnet
  • 11:06 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host snapshot1016.eqiad.wmnet
  • 11:06 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1015.eqiad.wmnet
  • 11:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1002.eqiad.wmnet
  • 11:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T367856)', diff saved to https://phabricator.wikimedia.org/P67222 and previous config saved to /var/cache/conftool/dbconfig/20240805-110512-marostegui.json
  • 11:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 11:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 11:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T367856)', diff saved to https://phabricator.wikimedia.org/P67221 and previous config saved to /var/cache/conftool/dbconfig/20240805-110450-marostegui.json
  • 11:04 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-client1002.eqiad.wmnet
  • 11:03 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-conf1004.eqiad.wmnet with reason: host reimage
  • 11:00 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-conf1004.eqiad.wmnet with reason: host reimage
  • 11:00 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host snapshot1015.eqiad.wmnet
  • 11:00 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1014.eqiad.wmnet
  • 10:59 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1002.eqiad.wmnet
  • 10:53 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host snapshot1014.eqiad.wmnet
  • 10:50 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1004.eqiad.wmnet with OS bookworm
  • 10:49 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1001.eqiad.wmnet
  • 10:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P67220 and previous config saved to /var/cache/conftool/dbconfig/20240805-104943-marostegui.json
  • 10:49 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-conf1004.eqiad.wmnet with OS bookworm
  • 10:43 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-master1001.eqiad.wmnet
  • 10:40 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
  • 10:37 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1001.eqiad.wmnet
  • 10:36 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts clouddb1021.eqiad.wmnet
  • 10:36 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:36 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 10:35 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 10:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P67219 and previous config saved to /var/cache/conftool/dbconfig/20240805-103437-marostegui.json
  • 10:34 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet
  • 10:31 btullis@cumin1002: START - Cookbook sre.dns.netbox
  • 10:30 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1001.eqiad.wmnet
  • 10:24 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts clouddb1021.eqiad.wmnet
  • 10:22 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@537b288]: (no justification provided) (duration: 00m 36s)
  • 10:22 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@537b288]: (no justification provided)
  • 10:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T367856)', diff saved to https://phabricator.wikimedia.org/P67218 and previous config saved to /var/cache/conftool/dbconfig/20240805-101930-marostegui.json
  • 09:52 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:48 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:48 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 09:44 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:40 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 09:39 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:38 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 09:38 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:36 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 09:35 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1004.eqiad.wmnet with OS bookworm
  • 09:35 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddumps1002.wikimedia.org
  • 09:27 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddumps1002.wikimedia.org
  • 09:24 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddumps1001.wikimedia.org
  • 09:16 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddumps1001.wikimedia.org
  • 08:30 zabe@deploy1003: Finished scap: Backport for noc: Provide db-sections.php (duration: 22m 04s)
  • 08:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox2003.codfw.wmnet
  • 08:28 ayounsi@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netbox2003.codfw.wmnet
  • 08:20 zabe@deploy1003: zabe: Continuing with sync
  • 08:20 zabe@deploy1003: zabe: Backport for noc: Provide db-sections.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:20 vgutierrez: manually removing wmf_auto_restart_benthos@haproxy_cache.service on cp4037 - T370741
  • 08:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox1003.eqiad.wmnet
  • 08:11 ayounsi@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netbox1003.eqiad.wmnet
  • 08:11 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 08:08 zabe@deploy1003: Started scap sync-world: Backport for noc: Provide db-sections.php
  • 08:02 zabe: zabe@mwmaint1002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=loginwiki --logwiki=metawiki 'Lirielmartinss' 'Ligg89' # T371784
  • 08:01 zabe: zabe@mwmaint1002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=loginwiki --logwiki=metawiki "It'sMogli" 'ItsMogli' # T371784
  • 06:55 XioNoX: push `LVS-service-ips` rename to ssw1-d8-codfw
  • 06:25 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox1003.eqiad.wmnet

2024-08-04

  • 15:44 mnz@deploy1003: Finished deploy [airflow-dags/research@d573c40]: (no justification provided) (duration: 00m 31s)
  • 15:44 mnz@deploy1003: Started deploy [airflow-dags/research@d573c40]: (no justification provided)
  • 11:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T367856)', diff saved to https://phabricator.wikimedia.org/P67217 and previous config saved to /var/cache/conftool/dbconfig/20240804-113742-marostegui.json
  • 11:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 11:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 11:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T367856)', diff saved to https://phabricator.wikimedia.org/P67216 and previous config saved to /var/cache/conftool/dbconfig/20240804-113720-marostegui.json
  • 11:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P67215 and previous config saved to /var/cache/conftool/dbconfig/20240804-112213-marostegui.json
  • 11:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P67214 and previous config saved to /var/cache/conftool/dbconfig/20240804-110706-marostegui.json
  • 10:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T367856)', diff saved to https://phabricator.wikimedia.org/P67213 and previous config saved to /var/cache/conftool/dbconfig/20240804-105159-marostegui.json
  • 05:54 ryankemper: [WDQS] Restart wdqs2010 to fix free allocators error

2024-08-03

  • 16:53 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1022.eqiad.wmnet with OS bullseye
  • 16:15 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1021.eqiad.wmnet with OS bullseye
  • 10:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T367856)', diff saved to https://phabricator.wikimedia.org/P67212 and previous config saved to /var/cache/conftool/dbconfig/20240803-100308-marostegui.json
  • 10:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 10:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 10:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T367856)', diff saved to https://phabricator.wikimedia.org/P67211 and previous config saved to /var/cache/conftool/dbconfig/20240803-100228-marostegui.json
  • 09:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P67210 and previous config saved to /var/cache/conftool/dbconfig/20240803-094721-marostegui.json
  • 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P67209 and previous config saved to /var/cache/conftool/dbconfig/20240803-093214-marostegui.json
  • 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T367856)', diff saved to https://phabricator.wikimedia.org/P67208 and previous config saved to /var/cache/conftool/dbconfig/20240803-091707-marostegui.json
  • 03:09 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1260.eqiad.wmnet with OS bullseye
  • 02:50 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1023.eqiad.wmnet with OS bullseye
  • 02:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1269.eqiad.wmnet with OS bullseye
  • 02:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 02:21 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 02:15 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1266.eqiad.wmnet with OS bullseye
  • 02:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1269.eqiad.wmnet with reason: host reimage
  • 02:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1269.eqiad.wmnet with reason: host reimage
  • 01:53 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1022.eqiad.wmnet with reason: host reimage
  • 01:50 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1022.eqiad.wmnet with reason: host reimage
  • 01:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1260.eqiad.wmnet with OS bullseye
  • 01:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1268.eqiad.wmnet with OS bullseye
  • 01:48 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:48 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:46 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1269.eqiad.wmnet with OS bullseye
  • 01:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1267.eqiad.wmnet with OS bullseye
  • 01:45 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:45 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:37 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1260.eqiad.wmnet with OS bullseye
  • 01:30 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1268.eqiad.wmnet with reason: host reimage
  • 01:29 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1023.eqiad.wmnet with OS bullseye
  • 01:28 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1022.eqiad.wmnet with OS bullseye
  • 01:28 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1268.eqiad.wmnet with reason: host reimage
  • 01:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1267.eqiad.wmnet with reason: host reimage
  • 01:25 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1267.eqiad.wmnet with reason: host reimage
  • 01:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on 9 hosts with reason: T364368 rejiggering hosts
  • 01:15 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on 9 hosts with reason: T364368 rejiggering hosts
  • 01:14 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1021.eqiad.wmnet with reason: host reimage
  • 01:12 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1021.eqiad.wmnet with reason: host reimage
  • 01:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1268.eqiad.wmnet with OS bullseye
  • 01:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1265.eqiad.wmnet with OS bullseye
  • 01:11 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:10 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:09 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1267.eqiad.wmnet with OS bullseye
  • 01:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1264.eqiad.wmnet with OS bullseye
  • 01:08 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1263.eqiad.wmnet with OS bullseye
  • 01:05 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 01:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1262.eqiad.wmnet with OS bullseye
  • 00:57 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:57 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1266.eqiad.wmnet with OS bullseye
  • 00:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1261.eqiad.wmnet with OS bullseye
  • 00:55 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:54 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on wdqs[2021-2022,2024-2025].codfw.wmnet with reason: T364368 rejiggering hosts
  • 00:54 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on wdqs[2021-2022,2024-2025].codfw.wmnet with reason: T364368 rejiggering hosts
  • 00:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1265.eqiad.wmnet with reason: host reimage
  • 00:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1264.eqiad.wmnet with reason: host reimage
  • 00:50 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1265.eqiad.wmnet with reason: host reimage
  • 00:49 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
  • 00:47 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1264.eqiad.wmnet with reason: host reimage
  • 00:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1263.eqiad.wmnet with reason: host reimage
  • 00:44 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1263.eqiad.wmnet with reason: host reimage
  • 00:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1262.eqiad.wmnet with reason: host reimage
  • 00:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1261.eqiad.wmnet with reason: host reimage
  • 00:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1262.eqiad.wmnet with reason: host reimage
  • 00:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1261.eqiad.wmnet with reason: host reimage
  • 00:33 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1265.eqiad.wmnet with OS bullseye
  • 00:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1258.eqiad.wmnet with OS bullseye
  • 00:33 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:32 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1264.eqiad.wmnet with OS bullseye
  • 00:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1259.eqiad.wmnet with OS bullseye
  • 00:29 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:29 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:27 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1263.eqiad.wmnet with OS bullseye
  • 00:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1258.eqiad.wmnet with reason: host reimage
  • 00:19 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1259.eqiad.wmnet with reason: host reimage
  • 00:18 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1262.eqiad.wmnet with OS bullseye
  • 00:18 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1261.eqiad.wmnet with OS bullseye
  • 00:17 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1256.eqiad.wmnet with OS bullseye
  • 00:17 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:17 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1258.eqiad.wmnet with reason: host reimage
  • 00:17 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:17 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1259.eqiad.wmnet with reason: host reimage
  • 00:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1260.eqiad.wmnet with OS bullseye
  • 00:14 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1259.eqiad.wmnet with OS bullseye
  • 00:14 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1258.eqiad.wmnet with OS bullseye
  • 00:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1254.eqiad.wmnet with OS bullseye
  • 00:13 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:13 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:07 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1255.eqiad.wmnet with OS bullseye
  • 00:07 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:07 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1256.eqiad.wmnet with reason: host reimage
  • 00:06 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1257.eqiad.wmnet with OS bullseye
  • 00:05 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1254.eqiad.wmnet with reason: host reimage

2024-08-02

  • 23:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1255.eqiad.wmnet with reason: host reimage
  • 23:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1257.eqiad.wmnet with reason: host reimage
  • 23:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1255.eqiad.wmnet with reason: host reimage
  • 23:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1256.eqiad.wmnet with reason: host reimage
  • 23:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1254.eqiad.wmnet with reason: host reimage
  • 23:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1257.eqiad.wmnet with reason: host reimage
  • 23:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1255.eqiad.wmnet with OS bullseye
  • 23:48 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1256.eqiad.wmnet with OS bullseye
  • 23:48 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1254.eqiad.wmnet with OS bullseye
  • 23:48 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1257.eqiad.wmnet with OS bullseye
  • 23:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1253.eqiad.wmnet with OS bullseye
  • 23:46 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:45 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1252.eqiad.wmnet with OS bullseye
  • 23:44 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:44 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1251.eqiad.wmnet with OS bullseye
  • 23:40 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:40 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1250.eqiad.wmnet with OS bullseye
  • 23:36 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:36 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1253.eqiad.wmnet with reason: host reimage
  • 23:33 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1253.eqiad.wmnet with reason: host reimage
  • 23:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1252.eqiad.wmnet with reason: host reimage
  • 23:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1253.eqiad.wmnet with OS bullseye
  • 23:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1251.eqiad.wmnet with reason: host reimage
  • 23:29 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1252.eqiad.wmnet with reason: host reimage
  • 23:26 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1251.eqiad.wmnet with reason: host reimage
  • 23:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1252.eqiad.wmnet with OS bullseye
  • 23:26 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1250.eqiad.wmnet with reason: host reimage
  • 23:24 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1251.eqiad.wmnet with OS bullseye
  • 23:24 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1250.eqiad.wmnet with reason: host reimage
  • 23:21 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1250.eqiad.wmnet with OS bullseye
  • 23:19 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:48 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:55 ejegg: standalone (IPN listener) SmashPig upgraded from 1b2d9a6e to 5e784691
  • 16:01 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@d573c40]: Deploy latest DAGs for analytics Airflow instance. T368756 (duration: 01m 02s)
  • 16:00 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@d573c40]: Deploy latest DAGs for analytics Airflow instance. T368756
  • 15:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2235.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:05 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2235.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:00 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2234.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:53 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2234.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2233.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:49 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2233.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2232.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2232.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:34 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2231.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2231.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2008.codfw.wmnet with OS bookworm
  • 13:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus2008.codfw.wmnet with reason: host reimage
  • 13:52 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus2008.codfw.wmnet with reason: host reimage
  • 13:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2008.codfw.wmnet with OS bookworm
  • 13:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2007.codfw.wmnet with OS bookworm
  • 13:44 sukhe: running authdns-update for CR: 1059362 T371304
  • 13:44 sukhe: running authdns-update for CR: T3713041059362
  • 13:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
  • 13:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host alert2002.wikimedia.org with OS bookworm
  • 13:35 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
  • 13:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2007.codfw.wmnet with OS bookworm
  • 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on alert2002.wikimedia.org with reason: host reimage
  • 13:24 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on alert2002.wikimedia.org with reason: host reimage
  • 13:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host alert2002.wikimedia.org with OS bookworm
  • 13:11 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "prometheus - ayounsi@cumin1002"
  • 13:10 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "prometheus - ayounsi@cumin1002"
  • 11:03 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:03 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:55 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:23 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "alert2002 - ayounsi@cumin1002"
  • 10:18 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "alert2002 - ayounsi@cumin1002"
  • 10:18 elukey: manually start dump_cloud_ip_ranges.service on puppetmaster1001 as test
  • 10:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:14 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:09 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T367856)', diff saved to https://phabricator.wikimedia.org/P67203 and previous config saved to /var/cache/conftool/dbconfig/20240802-090649-marostegui.json
  • 09:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 09:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 09:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T367856)', diff saved to https://phabricator.wikimedia.org/P67202 and previous config saved to /var/cache/conftool/dbconfig/20240802-090627-marostegui.json
  • 08:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P67201 and previous config saved to /var/cache/conftool/dbconfig/20240802-085119-marostegui.json
  • 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P67200 and previous config saved to /var/cache/conftool/dbconfig/20240802-083612-marostegui.json
  • 08:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T367856)', diff saved to https://phabricator.wikimedia.org/P67199 and previous config saved to /var/cache/conftool/dbconfig/20240802-082105-marostegui.json
  • 08:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:19 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:19 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:37 slyngshede@cumin1002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Sbailey out of all services on: 2241 hosts
  • 07:36 slyngshede@cumin1002: START - Cookbook sre.idm.logout Logging Sbailey out of all services on: 2241 hosts
  • 02:09 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1260
  • 02:08 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1260
  • 02:07 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1267.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:07 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:07 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1261.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1266.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1269.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:03 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:01 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1263.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1268.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1262.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1265.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1264.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1267.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1268.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1269.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:37 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1266.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1264.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1265.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1261.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:30 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:28 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1263.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:27 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1261.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:27 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1262.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1261.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:25 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:25 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1260-9 - jclark@cumin1002"
  • 01:25 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1260-9 - jclark@cumin1002"
  • 01:22 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 01:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1250.eqiad.wmnet with OS bullseye
  • 00:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1250.eqiad.wmnet with reason: host reimage
  • 00:57 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1257.eqiad.wmnet with OS bullseye
  • 00:55 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1250.eqiad.wmnet with reason: host reimage
  • 00:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1252.eqiad.wmnet with OS bullseye
  • 00:53 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1250.eqiad.wmnet with OS bullseye
  • 00:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1250.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1259.eqiad.wmnet with OS bullseye
  • 00:50 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1250.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:48 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1253.eqiad.wmnet with OS bullseye
  • 00:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1258.eqiad.wmnet with OS bullseye
  • 00:43 zabe@deploy1003: Finished scap: Backport for Further configurations for u4cwiki (T371452) (duration: 07m 24s)
  • 00:41 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1254.eqiad.wmnet with OS bullseye
  • 00:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1257.eqiad.wmnet with reason: host reimage
  • 00:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1255.eqiad.wmnet with OS bullseye
  • 00:39 zabe@deploy1003: zabe: Continuing with sync
  • 00:38 zabe@deploy1003: zabe: Backport for Further configurations for u4cwiki (T371452) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 00:38 zabe: zabe@mwmaint1002:~$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php u4cwiki translate # T371452
  • 00:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1252.eqiad.wmnet with reason: host reimage
  • 00:36 zabe@deploy1003: Started scap sync-world: Backport for Further configurations for u4cwiki (T371452)
  • 00:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1256.eqiad.wmnet with OS bullseye
  • 00:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1259.eqiad.wmnet with reason: host reimage
  • 00:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1253.eqiad.wmnet with reason: host reimage
  • 00:30 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1250.eqiad.wmnet with OS bullseye
  • 00:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host alert2002.wikimedia.org with OS bookworm
  • 00:28 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1251.eqiad.wmnet with OS bullseye
  • 00:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1258.eqiad.wmnet with reason: host reimage
  • 00:24 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1254.eqiad.wmnet with reason: host reimage
  • 00:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1255.eqiad.wmnet with reason: host reimage
  • 00:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on alert2002.wikimedia.org with reason: host reimage
  • 00:17 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1256.eqiad.wmnet with reason: host reimage
  • 00:15 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on alert2002.wikimedia.org with reason: host reimage
  • 00:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1250.eqiad.wmnet with reason: host reimage
  • 00:13 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host alert2002.wikimedia.org with OS bookworm
  • 00:11 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1255.eqiad.wmnet with reason: host reimage
  • 00:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1251.eqiad.wmnet with reason: host reimage
  • 00:11 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1256.eqiad.wmnet with reason: host reimage
  • 00:10 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1258.eqiad.wmnet with reason: host reimage
  • 00:10 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1259.eqiad.wmnet with reason: host reimage
  • 00:09 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1257.eqiad.wmnet with reason: host reimage
  • 00:09 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1254.eqiad.wmnet with reason: host reimage
  • 00:09 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1252.eqiad.wmnet with reason: host reimage
  • 00:08 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1253.eqiad.wmnet with reason: host reimage
  • 00:08 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1250.eqiad.wmnet with reason: host reimage
  • 00:08 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1251.eqiad.wmnet with reason: host reimage

2024-08-01

  • 23:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1255.eqiad.wmnet with OS bullseye
  • 23:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1256.eqiad.wmnet with OS bullseye
  • 23:53 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1259.eqiad.wmnet with OS bullseye
  • 23:53 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1258.eqiad.wmnet with OS bullseye
  • 23:53 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1257.eqiad.wmnet with OS bullseye
  • 23:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1254.eqiad.wmnet with OS bullseye
  • 23:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1253.eqiad.wmnet with OS bullseye
  • 23:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1252.eqiad.wmnet with OS bullseye
  • 23:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1251.eqiad.wmnet with OS bullseye
  • 23:51 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1250.eqiad.wmnet with OS bullseye
  • 23:37 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:36 zabe@deploy1003: Finished scap: Backport for Automatically set db section to s5 for new wiki (duration: 07m 20s)
  • 23:34 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 23:31 zabe@deploy1003: zabe: Continuing with sync
  • 23:31 zabe@deploy1003: zabe: Backport for Automatically set db section to s5 for new wiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:28 zabe@deploy1003: Started scap sync-world: Backport for Automatically set db section to s5 for new wiki
  • 22:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T367856)', diff saved to https://phabricator.wikimedia.org/P67198 and previous config saved to /var/cache/conftool/dbconfig/20240801-223711-marostegui.json
  • 22:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P67197 and previous config saved to /var/cache/conftool/dbconfig/20240801-222204-marostegui.json
  • 22:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P67196 and previous config saved to /var/cache/conftool/dbconfig/20240801-220657-marostegui.json
  • 21:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T367856)', diff saved to https://phabricator.wikimedia.org/P67195 and previous config saved to /var/cache/conftool/dbconfig/20240801-215150-marostegui.json
  • 20:40 thcipriani: utc late window complete
  • 20:28 thcipriani@deploy1003: Finished scap: Backport for revisionCheck: skip null wikiPages (T371348) (duration: 09m 19s)
  • 20:23 thcipriani@deploy1003: thcipriani, jsn: Continuing with sync
  • 20:20 thcipriani@deploy1003: thcipriani, jsn: Backport for revisionCheck: skip null wikiPages (T371348) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:18 thcipriani@deploy1003: Started scap sync-world: Backport for revisionCheck: skip null wikiPages (T371348)
  • 20:01 dwisehaupt@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:01 dwisehaupt@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: decomission of frdb2002, payments2001, and payments2002 - dwisehaupt@cumin1002"
  • 20:01 dwisehaupt@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: decomission of frdb2002, payments2001, and payments2002 - dwisehaupt@cumin1002"
  • 19:56 dwisehaupt@cumin1002: START - Cookbook sre.dns.netbox
  • 19:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host alert2002.wikimedia.org with OS bookworm
  • 19:15 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host alert2002.wikimedia.org with OS bookworm
  • 18:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=(cdn|ats-be)
  • 18:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host alert2002.wikimedia.org with OS bookworm
  • 18:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on alert2002.wikimedia.org with reason: host reimage
  • 18:29 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on alert2002.wikimedia.org with reason: host reimage
  • 18:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host alert2002.wikimedia.org with OS bookworm
  • 18:10 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.16 refs T366961
  • 18:00 brennen: 1.43.0-wmf.16 train (T366961): no current blockers, logs cluttered but not too scary, rolling to all wikis.
  • 17:58 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 17:58 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 17:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus2007.codfw.wmnet with OS bookworm
  • 17:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
  • 17:27 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
  • 17:24 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2007.codfw.wmnet with OS bookworm
  • 17:21 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 17:19 cdanis@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus2008.codfw.wmnet with OS bookworm
  • 16:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus2007.codfw.wmnet with OS bookworm
  • 16:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus2008.codfw.wmnet with reason: host reimage
  • 16:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
  • 16:27 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus2008.codfw.wmnet with reason: host reimage
  • 16:25 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
  • 16:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2008.codfw.wmnet with OS bookworm
  • 16:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2007.codfw.wmnet with OS bookworm
  • 16:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus2007.codfw.wmnet with OS bookworm
  • 16:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus2008.codfw.wmnet with OS bookworm
  • 15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
  • 15:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus2008.codfw.wmnet with reason: host reimage
  • 15:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
  • 15:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 15:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 15:47 volans: installing spicerack v8.10.0 to cumin1002
  • 15:47 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1041.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 15:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 15:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus2008.codfw.wmnet with reason: host reimage
  • 15:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 15:45 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 15:44 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 15:43 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 15:34 jgiannelos@deploy1003: Finished deploy [restbase/deploy@f696b76]: (no justification provided) (duration: 17m 07s)
  • 15:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1041.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 15:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2008.codfw.wmnet with OS bookworm
  • 15:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2007.codfw.wmnet with OS bookworm
  • 15:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['prometheus2008']
  • 15:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['prometheus2008']
  • 15:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host prometheus2008.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:23 volans: installing spicerack v8.10.0 to cumin2002
  • 15:17 jgiannelos@deploy1003: Started deploy [restbase/deploy@f696b76]: (no justification provided)
  • 15:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['prometheus2007']
  • 15:13 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['prometheus2007']
  • 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host prometheus2007.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:04 elukey: rollback debmonitor-server to 0.4.0-3 on debmonitor2003
  • 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host prometheus2008.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host prometheus2007.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding prometheus2007 to codfw - jhancock@cumin2002"
  • 15:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding prometheus2007 to codfw - jhancock@cumin2002"
  • 14:59 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1040.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 14:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host kubestage1003.eqiad.wmnet
  • 14:54 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host kubestage1003.eqiad.wmnet
  • 14:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) for host kubestage1003.eqiad.wmnet
  • 14:53 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node for host kubestage1003.eqiad.wmnet
  • 14:49 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1040.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 14:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:46 zabe@deploy1003: Finished scap: Backport for Move section mapping to separate file (duration: 08m 06s)
  • 14:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1039.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 14:41 zabe@deploy1003: zabe: Continuing with sync
  • 14:40 zabe@deploy1003: zabe: Backport for Move section mapping to separate file synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:38 zabe@deploy1003: Started scap sync-world: Backport for Move section mapping to separate file
  • 14:34 elukey: uploaded spicerack_8.10.0 to apt.wikimedia.org bullseye-wikimedia
  • 14:31 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 14:31 fabfur: repool cp4037 (T370741)
  • 14:28 elukey: upgrade debmonitor-server on debmonitor2003 to 0.5.0
  • 14:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host alert2002.wikimedia.org with OS bookworm
  • 14:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host alert2002.wikimedia.org with OS bookworm
  • 14:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host alert2002.wikimedia.org with OS bookworm
  • 14:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host alert2002.wikimedia.org with OS bookworm
  • 14:16 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1039.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 14:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 14:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host alert2002.wikimedia.org with OS bookworm
  • 14:05 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 14:01 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1037.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 13:52 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1037.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 13:49 cgoubert@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) for host kubestage1003.eqiad.wmnet
  • 13:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node for host kubestage1003.eqiad.wmnet
  • 13:46 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 13:44 cdanis@deploy1003: Finished scap: Backport for Increase IP cap limit for azwiki (T371439) (duration: 07m 28s)
  • 13:40 cdanis@deploy1003: cdanis, nmw03: Continuing with sync
  • 13:40 cdanis@deploy1003: cdanis, nmw03: Backport for Increase IP cap limit for azwiki (T371439) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 13:37 cdanis@deploy1003: Started scap sync-world: Backport for Increase IP cap limit for azwiki (T371439)
  • 13:19 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 13:18 fabfur: depool cp4037 to test remove benthos package / conffiles (T370741)
  • 13:09 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns4003.wikimedia.org,service=recdns [reason: [done] pdns-rec upgrade]
  • 13:06 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns4003.wikimedia.org,service=recdns [reason: pdns-rec upgrade]
  • 13:03 isaranto@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 13:00 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host gerrit2003.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 13:00 urbanecm@deploy1003: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync
  • 12:59 urbanecm@deploy1003: helmfile [staging] START helmfile.d/services/linkrecommendation: sync
  • 12:59 urbanecm@deploy1003: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: sync
  • 12:58 urbanecm@deploy1003: helmfile [codfw] START helmfile.d/services/linkrecommendation: sync
  • 12:55 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: sync
  • 12:55 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/linkrecommendation: sync
  • 12:55 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 12:55 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 12:52 elukey@cumin1002: START - Cookbook sre.hosts.provision for host gerrit2003.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 12:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on alert2002.wikimedia.org with reason: host reimage
  • 12:39 urbanecm: Decommission Add Link models for akwiki, nawiki (T371598)
  • 12:37 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on alert2002.wikimedia.org with reason: host reimage
  • 12:26 isaranto@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 12:19 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=dewiki --olderThan=1721045915 --verbose # T371597
  • 12:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host alert2002.wikimedia.org with OS bookworm
  • 12:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['alert2002']
  • 12:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['vrts2002']
  • 12:10 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['alert2002']
  • 12:10 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['vrts2002']
  • 12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) for host kubestage1003.eqiad.wmnet
  • 12:09 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node for host kubestage1003.eqiad.wmnet
  • 12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) for host kubestage1003.eqiad.wmnet
  • 12:06 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node for host kubestage1003.eqiad.wmnet
  • 11:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 11:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host alert2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 11:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 11:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host alert2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 11:48 kevinbazira@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67192 and previous config saved to /var/cache/conftool/dbconfig/20240801-113108-root.json
  • 11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67191 and previous config saved to /var/cache/conftool/dbconfig/20240801-111602-root.json
  • 11:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67190 and previous config saved to /var/cache/conftool/dbconfig/20240801-110057-root.json
  • 10:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67189 and previous config saved to /var/cache/conftool/dbconfig/20240801-104551-root.json
  • 10:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67188 and previous config saved to /var/cache/conftool/dbconfig/20240801-103046-root.json
  • 10:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67187 and previous config saved to /var/cache/conftool/dbconfig/20240801-101541-root.json
  • 10:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67186 and previous config saved to /var/cache/conftool/dbconfig/20240801-100035-root.json
  • 09:54 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1035.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 09:44 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1035.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 09:36 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephmon1006.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 09:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 09:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 09:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1233', diff saved to https://phabricator.wikimedia.org/P67185 and previous config saved to /var/cache/conftool/dbconfig/20240801-093123-marostegui.json
  • 09:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephmon1006.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 09:24 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephmon1005.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 09:16 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephmon1005.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 09:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephmon1004.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 09:00 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephmon1004.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2230.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 08:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2230.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 08:49 ayounsi@cumin1002: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 08:48 ayounsi@cumin1002: START - Cookbook sre.postgresql.postgres-init
  • 08:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2229.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 08:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T367856)', diff saved to https://phabricator.wikimedia.org/P67184 and previous config saved to /var/cache/conftool/dbconfig/20240801-084409-marostegui.json
  • 08:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 08:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T367856)', diff saved to https://phabricator.wikimedia.org/P67183 and previous config saved to /var/cache/conftool/dbconfig/20240801-084347-marostegui.json
  • 08:35 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2229.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 08:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P67182 and previous config saved to /var/cache/conftool/dbconfig/20240801-082840-marostegui.json
  • 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P67181 and previous config saved to /var/cache/conftool/dbconfig/20240801-081333-marostegui.json
  • 08:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 08:04 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "netbox4 sync - ayounsi@cumin1002"
  • 08:04 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "netbox4 sync - ayounsi@cumin1002"
  • 07:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T367856)', diff saved to https://phabricator.wikimedia.org/P67180 and previous config saved to /var/cache/conftool/dbconfig/20240801-075826-marostegui.json
  • 07:47 ayounsi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 07:47 ayounsi@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netbox 4 sync - ayounsi@cumin1002"
  • 07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T367856)', diff saved to https://phabricator.wikimedia.org/P67179 and previous config saved to /var/cache/conftool/dbconfig/20240801-074507-marostegui.json
  • 07:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 07:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 07:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T367856)', diff saved to https://phabricator.wikimedia.org/P67178 and previous config saved to /var/cache/conftool/dbconfig/20240801-074445-marostegui.json
  • 07:43 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts deploy1002.eqiad.wmnet
  • 07:43 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:41 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 07:39 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netbox 4 sync - ayounsi@cumin1002"
  • 07:36 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 07:36 ayounsi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 07:32 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 07:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P67177 and previous config saved to /var/cache/conftool/dbconfig/20240801-072938-marostegui.json
  • 07:21 akosiaris@cumin1002: START - Cookbook sre.hosts.decommission for hosts deploy1002.eqiad.wmnet
  • 07:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P67176 and previous config saved to /var/cache/conftool/dbconfig/20240801-071431-marostegui.json
  • 07:04 akosiaris: uncordon parse2001, parse1001 T359387
  • 06:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T367856)', diff saved to https://phabricator.wikimedia.org/P67175 and previous config saved to /var/cache/conftool/dbconfig/20240801-065924-marostegui.json
  • 06:48 ayounsi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 06:45 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 06:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:39 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 01:01 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:58 sukhe@cumin1002: START - Cookbook sre.dns.netbox
  • 00:53 sukhe: run authdns-update

Other archives

2000s

2010s

2020s