Server Admin Log/Archive 84

2024-08-31

15:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T371742)', diff saved to https://phabricator.wikimedia.org/P68498 and previous config saved to /var/cache/conftool/dbconfig/20240831-155331-ladsgroup.json
15:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P68497 and previous config saved to /var/cache/conftool/dbconfig/20240831-153824-ladsgroup.json
15:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T370903)', diff saved to https://phabricator.wikimedia.org/P68496 and previous config saved to /var/cache/conftool/dbconfig/20240831-153309-ladsgroup.json
15:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P68495 and previous config saved to /var/cache/conftool/dbconfig/20240831-152317-ladsgroup.json
15:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P68494 and previous config saved to /var/cache/conftool/dbconfig/20240831-151802-ladsgroup.json
15:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T371742)', diff saved to https://phabricator.wikimedia.org/P68493 and previous config saved to /var/cache/conftool/dbconfig/20240831-150810-ladsgroup.json
15:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P68492 and previous config saved to /var/cache/conftool/dbconfig/20240831-150254-ladsgroup.json
14:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T367856)', diff saved to https://phabricator.wikimedia.org/P68491 and previous config saved to /var/cache/conftool/dbconfig/20240831-145733-marostegui.json
14:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 7:00:00 on db2181.codfw.wmnet with reason: Maintenance
14:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 7:00:00 on db2181.codfw.wmnet with reason: Maintenance
14:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T367856)', diff saved to https://phabricator.wikimedia.org/P68490 and previous config saved to /var/cache/conftool/dbconfig/20240831-145712-marostegui.json
14:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T370903)', diff saved to https://phabricator.wikimedia.org/P68489 and previous config saved to /var/cache/conftool/dbconfig/20240831-144748-ladsgroup.json
14:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P68488 and previous config saved to /var/cache/conftool/dbconfig/20240831-144204-marostegui.json
14:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T370903)', diff saved to https://phabricator.wikimedia.org/P68487 and previous config saved to /var/cache/conftool/dbconfig/20240831-143348-ladsgroup.json
14:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2216.codfw.wmnet with reason: Maintenance
14:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2216.codfw.wmnet with reason: Maintenance
14:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T370903)', diff saved to https://phabricator.wikimedia.org/P68486 and previous config saved to /var/cache/conftool/dbconfig/20240831-143326-ladsgroup.json
14:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P68485 and previous config saved to /var/cache/conftool/dbconfig/20240831-142657-marostegui.json
14:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P68484 and previous config saved to /var/cache/conftool/dbconfig/20240831-141819-ladsgroup.json
14:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T367856)', diff saved to https://phabricator.wikimedia.org/P68483 and previous config saved to /var/cache/conftool/dbconfig/20240831-141150-marostegui.json
14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2218 (T371742)', diff saved to https://phabricator.wikimedia.org/P68482 and previous config saved to /var/cache/conftool/dbconfig/20240831-141011-ladsgroup.json
14:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2218.codfw.wmnet with reason: Maintenance
14:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2218.codfw.wmnet with reason: Maintenance
14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T371742)', diff saved to https://phabricator.wikimedia.org/P68481 and previous config saved to /var/cache/conftool/dbconfig/20240831-140949-ladsgroup.json
14:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P68480 and previous config saved to /var/cache/conftool/dbconfig/20240831-140311-ladsgroup.json
13:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P68479 and previous config saved to /var/cache/conftool/dbconfig/20240831-135442-ladsgroup.json
13:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T370903)', diff saved to https://phabricator.wikimedia.org/P68478 and previous config saved to /var/cache/conftool/dbconfig/20240831-134804-ladsgroup.json
13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P68477 and previous config saved to /var/cache/conftool/dbconfig/20240831-133935-ladsgroup.json
13:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T370903)', diff saved to https://phabricator.wikimedia.org/P68476 and previous config saved to /var/cache/conftool/dbconfig/20240831-133349-ladsgroup.json
13:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2212.codfw.wmnet with reason: Maintenance
13:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2212.codfw.wmnet with reason: Maintenance
13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T371742)', diff saved to https://phabricator.wikimedia.org/P68475 and previous config saved to /var/cache/conftool/dbconfig/20240831-132428-ladsgroup.json
13:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2202.codfw.wmnet with reason: Maintenance
13:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2202.codfw.wmnet with reason: Maintenance
13:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T370903)', diff saved to https://phabricator.wikimedia.org/P68474 and previous config saved to /var/cache/conftool/dbconfig/20240831-131907-ladsgroup.json
13:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P68473 and previous config saved to /var/cache/conftool/dbconfig/20240831-130400-ladsgroup.json
12:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P68472 and previous config saved to /var/cache/conftool/dbconfig/20240831-124853-ladsgroup.json
12:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T370903)', diff saved to https://phabricator.wikimedia.org/P68471 and previous config saved to /var/cache/conftool/dbconfig/20240831-123346-ladsgroup.json
12:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T371742)', diff saved to https://phabricator.wikimedia.org/P68470 and previous config saved to /var/cache/conftool/dbconfig/20240831-122900-ladsgroup.json
12:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2208.codfw.wmnet with reason: Maintenance
12:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2208.codfw.wmnet with reason: Maintenance
12:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T370903)', diff saved to https://phabricator.wikimedia.org/P68469 and previous config saved to /var/cache/conftool/dbconfig/20240831-121937-ladsgroup.json
12:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2188.codfw.wmnet with reason: Maintenance
12:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2188.codfw.wmnet with reason: Maintenance
12:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T370903)', diff saved to https://phabricator.wikimedia.org/P68468 and previous config saved to /var/cache/conftool/dbconfig/20240831-121915-ladsgroup.json
12:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P68467 and previous config saved to /var/cache/conftool/dbconfig/20240831-120409-ladsgroup.json
11:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P68466 and previous config saved to /var/cache/conftool/dbconfig/20240831-114902-ladsgroup.json
11:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2200.codfw.wmnet with reason: Maintenance
11:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2200.codfw.wmnet with reason: Maintenance
11:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T370903)', diff saved to https://phabricator.wikimedia.org/P68465 and previous config saved to /var/cache/conftool/dbconfig/20240831-113355-ladsgroup.json
11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T370903)', diff saved to https://phabricator.wikimedia.org/P68464 and previous config saved to /var/cache/conftool/dbconfig/20240831-111528-ladsgroup.json
11:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
11:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T370903)', diff saved to https://phabricator.wikimedia.org/P68463 and previous config saved to /var/cache/conftool/dbconfig/20240831-111506-ladsgroup.json
10:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P68462 and previous config saved to /var/cache/conftool/dbconfig/20240831-105959-ladsgroup.json
10:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2198.codfw.wmnet with reason: Maintenance
10:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2198.codfw.wmnet with reason: Maintenance
10:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T371742)', diff saved to https://phabricator.wikimedia.org/P68461 and previous config saved to /var/cache/conftool/dbconfig/20240831-104829-ladsgroup.json
10:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P68460 and previous config saved to /var/cache/conftool/dbconfig/20240831-104452-ladsgroup.json
10:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P68459 and previous config saved to /var/cache/conftool/dbconfig/20240831-103322-ladsgroup.json
10:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T370903)', diff saved to https://phabricator.wikimedia.org/P68458 and previous config saved to /var/cache/conftool/dbconfig/20240831-102944-ladsgroup.json
10:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P68457 and previous config saved to /var/cache/conftool/dbconfig/20240831-101815-ladsgroup.json
10:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T370903)', diff saved to https://phabricator.wikimedia.org/P68456 and previous config saved to /var/cache/conftool/dbconfig/20240831-101131-ladsgroup.json
10:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
10:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
10:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T370903)', diff saved to https://phabricator.wikimedia.org/P68455 and previous config saved to /var/cache/conftool/dbconfig/20240831-101109-ladsgroup.json
10:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T371742)', diff saved to https://phabricator.wikimedia.org/P68454 and previous config saved to /var/cache/conftool/dbconfig/20240831-100308-ladsgroup.json
09:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P68453 and previous config saved to /var/cache/conftool/dbconfig/20240831-095602-ladsgroup.json
09:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P68452 and previous config saved to /var/cache/conftool/dbconfig/20240831-094055-ladsgroup.json
09:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T370903)', diff saved to https://phabricator.wikimedia.org/P68451 and previous config saved to /var/cache/conftool/dbconfig/20240831-092548-ladsgroup.json
09:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T370903)', diff saved to https://phabricator.wikimedia.org/P68450 and previous config saved to /var/cache/conftool/dbconfig/20240831-090843-ladsgroup.json
09:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
09:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
09:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
09:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
09:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T370903)', diff saved to https://phabricator.wikimedia.org/P68449 and previous config saved to /var/cache/conftool/dbconfig/20240831-090817-ladsgroup.json
09:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T371742)', diff saved to https://phabricator.wikimedia.org/P68448 and previous config saved to /var/cache/conftool/dbconfig/20240831-090155-ladsgroup.json
09:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
09:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
09:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T371742)', diff saved to https://phabricator.wikimedia.org/P68447 and previous config saved to /var/cache/conftool/dbconfig/20240831-090133-ladsgroup.json
08:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P68446 and previous config saved to /var/cache/conftool/dbconfig/20240831-085310-ladsgroup.json
08:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P68445 and previous config saved to /var/cache/conftool/dbconfig/20240831-084626-ladsgroup.json
08:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P68444 and previous config saved to /var/cache/conftool/dbconfig/20240831-083803-ladsgroup.json
08:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P68443 and previous config saved to /var/cache/conftool/dbconfig/20240831-083118-ladsgroup.json
08:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T370903)', diff saved to https://phabricator.wikimedia.org/P68442 and previous config saved to /var/cache/conftool/dbconfig/20240831-082256-ladsgroup.json
08:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T371742)', diff saved to https://phabricator.wikimedia.org/P68441 and previous config saved to /var/cache/conftool/dbconfig/20240831-081611-ladsgroup.json
08:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T370903)', diff saved to https://phabricator.wikimedia.org/P68440 and previous config saved to /var/cache/conftool/dbconfig/20240831-080733-ladsgroup.json
08:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
08:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
08:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T370903)', diff saved to https://phabricator.wikimedia.org/P68439 and previous config saved to /var/cache/conftool/dbconfig/20240831-080700-ladsgroup.json
07:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P68438 and previous config saved to /var/cache/conftool/dbconfig/20240831-075152-ladsgroup.json
07:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P68437 and previous config saved to /var/cache/conftool/dbconfig/20240831-073645-ladsgroup.json
07:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T370903)', diff saved to https://phabricator.wikimedia.org/P68436 and previous config saved to /var/cache/conftool/dbconfig/20240831-072138-ladsgroup.json
07:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T371742)', diff saved to https://phabricator.wikimedia.org/P68435 and previous config saved to /var/cache/conftool/dbconfig/20240831-071243-ladsgroup.json
07:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
07:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
07:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T371742)', diff saved to https://phabricator.wikimedia.org/P68434 and previous config saved to /var/cache/conftool/dbconfig/20240831-071221-ladsgroup.json
07:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T370903)', diff saved to https://phabricator.wikimedia.org/P68433 and previous config saved to /var/cache/conftool/dbconfig/20240831-070333-ladsgroup.json
07:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
07:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
07:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T370903)', diff saved to https://phabricator.wikimedia.org/P68432 and previous config saved to /var/cache/conftool/dbconfig/20240831-070311-ladsgroup.json
06:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P68431 and previous config saved to /var/cache/conftool/dbconfig/20240831-065714-ladsgroup.json
06:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P68430 and previous config saved to /var/cache/conftool/dbconfig/20240831-064803-ladsgroup.json
06:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P68429 and previous config saved to /var/cache/conftool/dbconfig/20240831-064207-ladsgroup.json
06:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P68428 and previous config saved to /var/cache/conftool/dbconfig/20240831-063256-ladsgroup.json
06:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T371742)', diff saved to https://phabricator.wikimedia.org/P68427 and previous config saved to /var/cache/conftool/dbconfig/20240831-062659-ladsgroup.json
06:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T370903)', diff saved to https://phabricator.wikimedia.org/P68426 and previous config saved to /var/cache/conftool/dbconfig/20240831-061749-ladsgroup.json
05:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T370903)', diff saved to https://phabricator.wikimedia.org/P68425 and previous config saved to /var/cache/conftool/dbconfig/20240831-055741-ladsgroup.json
05:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
05:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
05:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T370903)', diff saved to https://phabricator.wikimedia.org/P68424 and previous config saved to /var/cache/conftool/dbconfig/20240831-055719-ladsgroup.json
05:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P68423 and previous config saved to /var/cache/conftool/dbconfig/20240831-054211-ladsgroup.json
05:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P68422 and previous config saved to /var/cache/conftool/dbconfig/20240831-052704-ladsgroup.json
05:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T371742)', diff saved to https://phabricator.wikimedia.org/P68421 and previous config saved to /var/cache/conftool/dbconfig/20240831-052543-ladsgroup.json
05:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
05:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
05:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
05:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
05:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T371742)', diff saved to https://phabricator.wikimedia.org/P68420 and previous config saved to /var/cache/conftool/dbconfig/20240831-052516-ladsgroup.json
05:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T370903)', diff saved to https://phabricator.wikimedia.org/P68419 and previous config saved to /var/cache/conftool/dbconfig/20240831-051157-ladsgroup.json
05:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P68418 and previous config saved to /var/cache/conftool/dbconfig/20240831-051009-ladsgroup.json
04:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P68417 and previous config saved to /var/cache/conftool/dbconfig/20240831-045501-ladsgroup.json
04:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T370903)', diff saved to https://phabricator.wikimedia.org/P68416 and previous config saved to /var/cache/conftool/dbconfig/20240831-045435-ladsgroup.json
04:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
04:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
04:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T371742)', diff saved to https://phabricator.wikimedia.org/P68415 and previous config saved to /var/cache/conftool/dbconfig/20240831-043954-ladsgroup.json
04:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
04:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
04:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T370903)', diff saved to https://phabricator.wikimedia.org/P68414 and previous config saved to /var/cache/conftool/dbconfig/20240831-043621-ladsgroup.json
04:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P68413 and previous config saved to /var/cache/conftool/dbconfig/20240831-042114-ladsgroup.json
04:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P68412 and previous config saved to /var/cache/conftool/dbconfig/20240831-040607-ladsgroup.json
03:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T370903)', diff saved to https://phabricator.wikimedia.org/P68411 and previous config saved to /var/cache/conftool/dbconfig/20240831-035100-ladsgroup.json
03:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T371742)', diff saved to https://phabricator.wikimedia.org/P68410 and previous config saved to /var/cache/conftool/dbconfig/20240831-033831-ladsgroup.json
03:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
03:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
03:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T371742)', diff saved to https://phabricator.wikimedia.org/P68409 and previous config saved to /var/cache/conftool/dbconfig/20240831-033809-ladsgroup.json
03:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T370903)', diff saved to https://phabricator.wikimedia.org/P68408 and previous config saved to /var/cache/conftool/dbconfig/20240831-033310-ladsgroup.json
03:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
03:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
03:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T370903)', diff saved to https://phabricator.wikimedia.org/P68407 and previous config saved to /var/cache/conftool/dbconfig/20240831-033248-ladsgroup.json
03:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P68406 and previous config saved to /var/cache/conftool/dbconfig/20240831-032302-ladsgroup.json
03:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P68405 and previous config saved to /var/cache/conftool/dbconfig/20240831-031741-ladsgroup.json
03:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P68404 and previous config saved to /var/cache/conftool/dbconfig/20240831-030755-ladsgroup.json
03:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P68403 and previous config saved to /var/cache/conftool/dbconfig/20240831-030234-ladsgroup.json
02:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T371742)', diff saved to https://phabricator.wikimedia.org/P68402 and previous config saved to /var/cache/conftool/dbconfig/20240831-025248-ladsgroup.json
02:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T370903)', diff saved to https://phabricator.wikimedia.org/P68401 and previous config saved to /var/cache/conftool/dbconfig/20240831-024727-ladsgroup.json
02:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T370903)', diff saved to https://phabricator.wikimedia.org/P68400 and previous config saved to /var/cache/conftool/dbconfig/20240831-022822-ladsgroup.json
02:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
02:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
02:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
02:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
01:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1240.eqiad.wmnet with reason: Maintenance
01:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1240.eqiad.wmnet with reason: Maintenance
01:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T371742)', diff saved to https://phabricator.wikimedia.org/P68399 and previous config saved to /var/cache/conftool/dbconfig/20240831-015132-ladsgroup.json
01:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
01:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
01:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T371742)', diff saved to https://phabricator.wikimedia.org/P68398 and previous config saved to /var/cache/conftool/dbconfig/20240831-015110-ladsgroup.json
01:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P68397 and previous config saved to /var/cache/conftool/dbconfig/20240831-013603-ladsgroup.json
01:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1239.eqiad.wmnet with reason: Maintenance
01:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1239.eqiad.wmnet with reason: Maintenance
01:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T370903)', diff saved to https://phabricator.wikimedia.org/P68396 and previous config saved to /var/cache/conftool/dbconfig/20240831-013254-ladsgroup.json
01:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P68395 and previous config saved to /var/cache/conftool/dbconfig/20240831-012055-ladsgroup.json
01:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P68394 and previous config saved to /var/cache/conftool/dbconfig/20240831-011746-ladsgroup.json
01:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T371742)', diff saved to https://phabricator.wikimedia.org/P68393 and previous config saved to /var/cache/conftool/dbconfig/20240831-010548-ladsgroup.json
01:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P68392 and previous config saved to /var/cache/conftool/dbconfig/20240831-010239-ladsgroup.json
00:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T370903)', diff saved to https://phabricator.wikimedia.org/P68391 and previous config saved to /var/cache/conftool/dbconfig/20240831-004732-ladsgroup.json
00:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T370903)', diff saved to https://phabricator.wikimedia.org/P68390 and previous config saved to /var/cache/conftool/dbconfig/20240831-002842-ladsgroup.json
00:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1235.eqiad.wmnet with reason: Maintenance
00:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1235.eqiad.wmnet with reason: Maintenance
00:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T370903)', diff saved to https://phabricator.wikimedia.org/P68389 and previous config saved to /var/cache/conftool/dbconfig/20240831-002819-ladsgroup.json
00:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P68388 and previous config saved to /var/cache/conftool/dbconfig/20240831-001312-ladsgroup.json
00:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2121 (T371742)', diff saved to https://phabricator.wikimedia.org/P68387 and previous config saved to /var/cache/conftool/dbconfig/20240831-000400-ladsgroup.json
00:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
00:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance

2024-08-30

23:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P68386 and previous config saved to /var/cache/conftool/dbconfig/20240830-235804-ladsgroup.json
23:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
23:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
23:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T371742)', diff saved to https://phabricator.wikimedia.org/P68385 and previous config saved to /var/cache/conftool/dbconfig/20240830-234621-ladsgroup.json
23:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T370903)', diff saved to https://phabricator.wikimedia.org/P68384 and previous config saved to /var/cache/conftool/dbconfig/20240830-234257-ladsgroup.json
23:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P68383 and previous config saved to /var/cache/conftool/dbconfig/20240830-233113-ladsgroup.json
23:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P68382 and previous config saved to /var/cache/conftool/dbconfig/20240830-231606-ladsgroup.json
23:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T371742)', diff saved to https://phabricator.wikimedia.org/P68381 and previous config saved to /var/cache/conftool/dbconfig/20240830-230059-ladsgroup.json
22:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T370903)', diff saved to https://phabricator.wikimedia.org/P68380 and previous config saved to /var/cache/conftool/dbconfig/20240830-225902-ladsgroup.json
22:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1234.eqiad.wmnet with reason: Maintenance
22:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1234.eqiad.wmnet with reason: Maintenance
22:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T370903)', diff saved to https://phabricator.wikimedia.org/P68379 and previous config saved to /var/cache/conftool/dbconfig/20240830-225840-ladsgroup.json
22:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P68378 and previous config saved to /var/cache/conftool/dbconfig/20240830-224333-ladsgroup.json
22:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P68377 and previous config saved to /var/cache/conftool/dbconfig/20240830-222826-ladsgroup.json
22:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T370903)', diff saved to https://phabricator.wikimedia.org/P68376 and previous config saved to /var/cache/conftool/dbconfig/20240830-221319-ladsgroup.json
21:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T370903)', diff saved to https://phabricator.wikimedia.org/P68375 and previous config saved to /var/cache/conftool/dbconfig/20240830-215611-ladsgroup.json
21:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1232.eqiad.wmnet with reason: Maintenance
21:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1232.eqiad.wmnet with reason: Maintenance
21:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T370903)', diff saved to https://phabricator.wikimedia.org/P68374 and previous config saved to /var/cache/conftool/dbconfig/20240830-215549-ladsgroup.json
21:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T371742)', diff saved to https://phabricator.wikimedia.org/P68373 and previous config saved to /var/cache/conftool/dbconfig/20240830-214558-ladsgroup.json
21:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance
21:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance
21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T371742)', diff saved to https://phabricator.wikimedia.org/P68372 and previous config saved to /var/cache/conftool/dbconfig/20240830-214536-ladsgroup.json
21:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P68371 and previous config saved to /var/cache/conftool/dbconfig/20240830-214042-ladsgroup.json
21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P68370 and previous config saved to /var/cache/conftool/dbconfig/20240830-213028-ladsgroup.json
21:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P68369 and previous config saved to /var/cache/conftool/dbconfig/20240830-212535-ladsgroup.json
21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P68368 and previous config saved to /var/cache/conftool/dbconfig/20240830-211521-ladsgroup.json
21:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T370903)', diff saved to https://phabricator.wikimedia.org/P68367 and previous config saved to /var/cache/conftool/dbconfig/20240830-211028-ladsgroup.json
21:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T371742)', diff saved to https://phabricator.wikimedia.org/P68366 and previous config saved to /var/cache/conftool/dbconfig/20240830-210014-ladsgroup.json
20:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T370903)', diff saved to https://phabricator.wikimedia.org/P68365 and previous config saved to /var/cache/conftool/dbconfig/20240830-201956-ladsgroup.json
20:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1219.eqiad.wmnet with reason: Maintenance
20:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1219.eqiad.wmnet with reason: Maintenance
20:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T370903)', diff saved to https://phabricator.wikimedia.org/P68364 and previous config saved to /var/cache/conftool/dbconfig/20240830-201934-ladsgroup.json
20:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T371742)', diff saved to https://phabricator.wikimedia.org/P68363 and previous config saved to /var/cache/conftool/dbconfig/20240830-200606-ladsgroup.json
20:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
20:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
20:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T371742)', diff saved to https://phabricator.wikimedia.org/P68362 and previous config saved to /var/cache/conftool/dbconfig/20240830-200544-ladsgroup.json
20:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P68361 and previous config saved to /var/cache/conftool/dbconfig/20240830-200427-ladsgroup.json
19:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P68359 and previous config saved to /var/cache/conftool/dbconfig/20240830-195037-ladsgroup.json
19:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P68358 and previous config saved to /var/cache/conftool/dbconfig/20240830-194919-ladsgroup.json
19:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P68357 and previous config saved to /var/cache/conftool/dbconfig/20240830-193528-ladsgroup.json
19:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T370903)', diff saved to https://phabricator.wikimedia.org/P68356 and previous config saved to /var/cache/conftool/dbconfig/20240830-193413-ladsgroup.json
19:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T371742)', diff saved to https://phabricator.wikimedia.org/P68355 and previous config saved to /var/cache/conftool/dbconfig/20240830-192021-ladsgroup.json
18:59 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs1014.eqiad.wmnet
18:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T370903)', diff saved to https://phabricator.wikimedia.org/P68354 and previous config saved to /var/cache/conftool/dbconfig/20240830-185427-ladsgroup.json
18:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1218.eqiad.wmnet with reason: Maintenance
18:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1218.eqiad.wmnet with reason: Maintenance
18:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T370903)', diff saved to https://phabricator.wikimedia.org/P68353 and previous config saved to /var/cache/conftool/dbconfig/20240830-185405-ladsgroup.json
18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T371742)', diff saved to https://phabricator.wikimedia.org/P68352 and previous config saved to /var/cache/conftool/dbconfig/20240830-185341-ladsgroup.json
18:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
18:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T371742)', diff saved to https://phabricator.wikimedia.org/P68351 and previous config saved to /var/cache/conftool/dbconfig/20240830-185319-ladsgroup.json
18:51 eevans@cumin1002: START - Cookbook sre.hosts.reboot-single for host aqs1014.eqiad.wmnet
18:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P68350 and previous config saved to /var/cache/conftool/dbconfig/20240830-183858-ladsgroup.json
18:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P68349 and previous config saved to /var/cache/conftool/dbconfig/20240830-183812-ladsgroup.json
18:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P68348 and previous config saved to /var/cache/conftool/dbconfig/20240830-182350-ladsgroup.json
18:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P68347 and previous config saved to /var/cache/conftool/dbconfig/20240830-182304-ladsgroup.json
18:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T370903)', diff saved to https://phabricator.wikimedia.org/P68346 and previous config saved to /var/cache/conftool/dbconfig/20240830-180843-ladsgroup.json
18:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T371742)', diff saved to https://phabricator.wikimedia.org/P68345 and previous config saved to /var/cache/conftool/dbconfig/20240830-180757-ladsgroup.json
17:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T370903)', diff saved to https://phabricator.wikimedia.org/P68344 and previous config saved to /var/cache/conftool/dbconfig/20240830-174822-ladsgroup.json
17:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1207.eqiad.wmnet with reason: Maintenance
17:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1207.eqiad.wmnet with reason: Maintenance
17:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T370903)', diff saved to https://phabricator.wikimedia.org/P68343 and previous config saved to /var/cache/conftool/dbconfig/20240830-174800-ladsgroup.json
17:44 mutante: releases1003/2003 - sudo apt-get remove openjdk-11-* - Java 11 has been replaced by Java 17 - T359795
17:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T371742)', diff saved to https://phabricator.wikimedia.org/P68342 and previous config saved to /var/cache/conftool/dbconfig/20240830-173905-ladsgroup.json
17:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
17:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
17:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T371742)', diff saved to https://phabricator.wikimedia.org/P68341 and previous config saved to /var/cache/conftool/dbconfig/20240830-173843-ladsgroup.json
17:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P68340 and previous config saved to /var/cache/conftool/dbconfig/20240830-173253-ladsgroup.json
17:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P68339 and previous config saved to /var/cache/conftool/dbconfig/20240830-172336-ladsgroup.json
17:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P68338 and previous config saved to /var/cache/conftool/dbconfig/20240830-171745-ladsgroup.json
17:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P68337 and previous config saved to /var/cache/conftool/dbconfig/20240830-170829-ladsgroup.json
17:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T370903)', diff saved to https://phabricator.wikimedia.org/P68336 and previous config saved to /var/cache/conftool/dbconfig/20240830-170238-ladsgroup.json
16:59 swfrench-wmf: running homer 'cr*codfw*' commit 'T372878'
16:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T371742)', diff saved to https://phabricator.wikimedia.org/P68335 and previous config saved to /var/cache/conftool/dbconfig/20240830-165322-ladsgroup.json
16:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T370903)', diff saved to https://phabricator.wikimedia.org/P68334 and previous config saved to /var/cache/conftool/dbconfig/20240830-164425-ladsgroup.json
16:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
16:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
16:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T370903)', diff saved to https://phabricator.wikimedia.org/P68333 and previous config saved to /var/cache/conftool/dbconfig/20240830-164403-ladsgroup.json
16:42 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2065.codfw.wmnet
16:42 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2065.codfw.wmnet
16:42 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2064.codfw.wmnet
16:42 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2064.codfw.wmnet
16:40 swfrench-wmf: running homer 'lsw1-b3-codfw*' commit 'T372878'
16:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker1033.eqiad.wmnet
16:39 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker1033.eqiad.wmnet
16:39 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2065.codfw.wmnet with OS bullseye
16:32 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2064.codfw.wmnet with OS bullseye
16:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P68332 and previous config saved to /var/cache/conftool/dbconfig/20240830-162856-ladsgroup.json
16:26 claime: homer 'cr*eqiad*' commit 'T351074, T372878, and fix ml-serve and dse-k8s bgp'
16:23 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2059.codfw.wmnet
16:23 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2059.codfw.wmnet
16:23 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2058.codfw.wmnet
16:23 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2058.codfw.wmnet
16:23 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2057.codfw.wmnet
16:23 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2057.codfw.wmnet
16:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T371742)', diff saved to https://phabricator.wikimedia.org/P68331 and previous config saved to /var/cache/conftool/dbconfig/20240830-162258-ladsgroup.json
16:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
16:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
16:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T371742)', diff saved to https://phabricator.wikimedia.org/P68330 and previous config saved to /var/cache/conftool/dbconfig/20240830-162236-ladsgroup.json
16:21 claime: flipping BGP flag to true in netbox for ml-serve-ctrl100[1-2],ml-serve100[1-4],dse-k8s-ctrl100[1-2],dse-k8s-worker100[1-4]
16:19 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2065.codfw.wmnet with reason: host reimage
16:15 swfrench@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2065.codfw.wmnet with reason: host reimage
16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P68329 and previous config saved to /var/cache/conftool/dbconfig/20240830-161349-ladsgroup.json
16:12 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2064.codfw.wmnet with reason: host reimage
16:09 swfrench@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2064.codfw.wmnet with reason: host reimage
16:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2062.codfw.wmnet
16:07 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2062.codfw.wmnet
16:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P68328 and previous config saved to /var/cache/conftool/dbconfig/20240830-160729-ladsgroup.json
16:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2061.codfw.wmnet
16:07 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2061.codfw.wmnet
16:02 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2060.codfw.wmnet
16:02 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2060.codfw.wmnet
16:01 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2061.codfw.wmnet with OS bullseye
15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T370903)', diff saved to https://phabricator.wikimedia.org/P68326 and previous config saved to /var/cache/conftool/dbconfig/20240830-155842-ladsgroup.json
15:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1033.eqiad.wmnet with OS bullseye
15:57 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2065
15:56 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2065
15:56 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2065
15:56 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2065.codfw.wmnet 235.16.192.10.in-addr.arpa 5.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:56 swfrench@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-worker2065.codfw.wmnet 235.16.192.10.in-addr.arpa 5.3.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:56 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:56 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2065 - swfrench@cumin2002"
15:56 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2065 - swfrench@cumin2002"
15:55 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2062.codfw.wmnet with OS bullseye
15:53 hnowlan: homer 'lsw1-a3-codfw*' commit
15:52 swfrench@cumin2002: START - Cookbook sre.dns.netbox
15:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P68325 and previous config saved to /var/cache/conftool/dbconfig/20240830-155222-ladsgroup.json
15:52 swfrench@cumin2002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2065
15:52 swfrench@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2065.codfw.wmnet with OS bullseye
15:50 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2064
15:50 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2064
15:50 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2064
15:50 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2064.codfw.wmnet 211.16.192.10.in-addr.arpa 1.1.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:50 swfrench@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-worker2064.codfw.wmnet 211.16.192.10.in-addr.arpa 1.1.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:50 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:49 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2064 - swfrench@cumin2002"
15:49 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2064 - swfrench@cumin2002"
15:49 claime: homer 'cr*eqiad*' commit 'T351074'
15:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2063.codfw.wmnet
15:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2063.codfw.wmnet
15:47 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2060.codfw.wmnet with OS bullseye
15:46 swfrench@cumin2002: START - Cookbook sre.dns.netbox
15:45 swfrench@cumin2002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2064
15:45 swfrench@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2064.codfw.wmnet with OS bullseye
15:44 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2064.codfw.wmnet wikikube-worker2065.codfw.wmnet on all recursors
15:44 swfrench@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-worker2064.codfw.wmnet wikikube-worker2065.codfw.wmnet on all recursors
15:44 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2057 to wikikube-worker2065
15:43 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2065
15:43 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2065
15:43 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:43 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2057 to wikikube-worker2065 - swfrench@cumin2002"
15:42 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2057 to wikikube-worker2065 - swfrench@cumin2002"
15:41 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2061.codfw.wmnet with reason: host reimage
15:41 claime: homer 'lsw1-a3-codfw*' commit 'T351074'
15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T370903)', diff saved to https://phabricator.wikimedia.org/P68323 and previous config saved to /var/cache/conftool/dbconfig/20240830-154054-ladsgroup.json
15:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2063.codfw.wmnet with OS bullseye
15:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
15:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
15:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
15:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T370903)', diff saved to https://phabricator.wikimedia.org/P68322 and previous config saved to /var/cache/conftool/dbconfig/20240830-154004-ladsgroup.json
15:39 swfrench@cumin2002: START - Cookbook sre.dns.netbox
15:39 swfrench@cumin2002: START - Cookbook sre.hosts.rename from kubernetes2057 to wikikube-worker2065
15:38 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2061.codfw.wmnet with reason: host reimage
15:38 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2030 to wikikube-worker2064
15:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1033.eqiad.wmnet with reason: host reimage
15:37 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2064
15:37 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2064
15:37 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:37 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2030 to wikikube-worker2064 - swfrench@cumin2002"
15:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T371742)', diff saved to https://phabricator.wikimedia.org/P68320 and previous config saved to /var/cache/conftool/dbconfig/20240830-153715-ladsgroup.json
15:37 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2030 to wikikube-worker2064 - swfrench@cumin2002"
15:35 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2062.codfw.wmnet with reason: host reimage
15:33 swfrench@cumin2002: START - Cookbook sre.dns.netbox
15:33 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1033.eqiad.wmnet with reason: host reimage
15:33 swfrench@cumin2002: START - Cookbook sre.hosts.rename from kubernetes2030 to wikikube-worker2064
15:31 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2062.codfw.wmnet with reason: host reimage
15:29 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2057.codfw.wmnet
15:29 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2057.codfw.wmnet
15:28 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2030.codfw.wmnet
15:28 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2030.codfw.wmnet
15:27 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2060.codfw.wmnet with reason: host reimage
15:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P68319 and previous config saved to /var/cache/conftool/dbconfig/20240830-152457-ladsgroup.json
15:23 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2060.codfw.wmnet with reason: host reimage
15:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2061
15:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2061
15:20 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2061
15:20 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2061.codfw.wmnet 47.0.192.10.in-addr.arpa 7.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:20 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2061.codfw.wmnet 47.0.192.10.in-addr.arpa 7.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:20 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:20 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2061 - hnowlan@cumin1002"
15:20 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2061 - hnowlan@cumin1002"
15:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2063.codfw.wmnet with reason: host reimage
15:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1033.eqiad.wmnet with OS bullseye
15:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1033.eqiad.wmnet on all recursors
15:19 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1033.eqiad.wmnet on all recursors
15:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1398 to wikikube-worker1033
15:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1033
15:17 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1033
15:17 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
15:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1398 to wikikube-worker1033 - cgoubert@cumin1002"
15:17 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2063.codfw.wmnet with reason: host reimage
15:17 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1398 to wikikube-worker1033 - cgoubert@cumin1002"
15:16 hnowlan@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2061
15:15 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2062
15:15 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2062
15:15 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2062
15:15 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2062.codfw.wmnet 48.0.192.10.in-addr.arpa 8.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:15 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2062.codfw.wmnet 48.0.192.10.in-addr.arpa 8.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:15 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:15 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2062 - hnowlan@cumin1002"
15:13 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:13 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2062 - hnowlan@cumin1002"
15:12 klausman@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:12 klausman@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:11 klausman@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
15:11 klausman@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
15:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T371742)', diff saved to https://phabricator.wikimedia.org/P68318 and previous config saved to /var/cache/conftool/dbconfig/20240830-151128-ladsgroup.json
15:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
15:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
15:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P68317 and previous config saved to /var/cache/conftool/dbconfig/20240830-150950-ladsgroup.json
15:08 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1398 to wikikube-worker1033
15:08 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
15:08 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
15:08 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2060.codfw.wmnet with OS bullseye
15:07 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
15:07 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
15:07 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2060.codfw.wmnet with OS bullseye
15:07 hnowlan@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2062
15:07 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
15:07 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2060.codfw.wmnet with OS bullseye
15:07 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2061.codfw.wmnet with OS bullseye
15:07 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2062.codfw.wmnet with OS bullseye
15:06 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
15:06 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
15:05 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2383 to wikikube-worker2060
15:04 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2060
15:02 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2060
15:02 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2063
15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2063
15:00 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
14:58 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
14:58 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2384 to wikikube-worker2061
14:58 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2063
14:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2063.codfw.wmnet 169.0.192.10.in-addr.arpa 9.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
14:58 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2063.codfw.wmnet 169.0.192.10.in-addr.arpa 9.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
14:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:58 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
14:58 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
14:58 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2061
14:57 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
14:57 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2061
14:57 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:57 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2384 to wikikube-worker2061 - hnowlan@cumin1002"
14:57 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
14:57 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2384 to wikikube-worker2061 - hnowlan@cumin1002"
14:56 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:56 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
14:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T370903)', diff saved to https://phabricator.wikimedia.org/P68316 and previous config saved to /var/cache/conftool/dbconfig/20240830-145442-ladsgroup.json
14:51 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2385 to wikikube-worker2062
14:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2062
14:50 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
14:50 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2062
14:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2385 to wikikube-worker2062 - hnowlan@cumin1002"
14:50 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2385 to wikikube-worker2062 - hnowlan@cumin1002"
14:49 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2063
14:49 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2063.codfw.wmnet with OS bullseye
14:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2299 to wikikube-worker2063
14:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2063
14:47 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2063
14:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2299 to wikikube-worker2063 - cgoubert@cumin1002"
14:46 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2299 to wikikube-worker2063 - cgoubert@cumin1002"
14:46 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
14:44 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2058.codfw.wmnet with OS bullseye
14:44 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2385 to wikikube-worker2062
14:44 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2384 to wikikube-worker2061
14:41 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:40 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2383 to wikikube-worker2060
14:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2299 to wikikube-worker2063
14:38 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2057.codfw.wmnet with OS bullseye
14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T370903)', diff saved to https://phabricator.wikimedia.org/P68315 and previous config saved to /var/cache/conftool/dbconfig/20240830-143537-ladsgroup.json
14:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1195.eqiad.wmnet with reason: Maintenance
14:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1195.eqiad.wmnet with reason: Maintenance
14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T370903)', diff saved to https://phabricator.wikimedia.org/P68314 and previous config saved to /var/cache/conftool/dbconfig/20240830-143516-ladsgroup.json
14:33 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2059.codfw.wmnet with OS bullseye
14:31 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2385.codfw.wmnet
14:31 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2385.codfw.wmnet
14:30 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2384.codfw.wmnet
14:30 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2384.codfw.wmnet
14:28 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2383.codfw.wmnet
14:28 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2383.codfw.wmnet
14:24 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2058.codfw.wmnet with reason: host reimage
14:22 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2058.codfw.wmnet with reason: host reimage
14:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P68313 and previous config saved to /var/cache/conftool/dbconfig/20240830-142008-ladsgroup.json
14:18 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2057.codfw.wmnet with reason: host reimage
14:15 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2057.codfw.wmnet with reason: host reimage
14:14 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2059.codfw.wmnet with reason: host reimage
14:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
14:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
14:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T371742)', diff saved to https://phabricator.wikimedia.org/P68312 and previous config saved to /var/cache/conftool/dbconfig/20240830-141311-ladsgroup.json
14:11 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2059.codfw.wmnet with reason: host reimage
14:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2058
14:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2058
14:06 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2058
14:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2058.codfw.wmnet 41.0.192.10.in-addr.arpa 1.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
14:06 akosiaris@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2058.codfw.wmnet 41.0.192.10.in-addr.arpa 1.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
14:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2058 - akosiaris@cumin1002"
14:06 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2058 - akosiaris@cumin1002"
14:05 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2056.codfw.wmnet
14:05 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2056.codfw.wmnet
14:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P68311 and previous config saved to /var/cache/conftool/dbconfig/20240830-140501-ladsgroup.json
14:03 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
13:58 akosiaris@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2058
13:58 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2057
13:58 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2057
13:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P68310 and previous config saved to /var/cache/conftool/dbconfig/20240830-135804-ladsgroup.json
13:57 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2057
13:57 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2057.codfw.wmnet 40.0.192.10.in-addr.arpa 0.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
13:56 akosiaris@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2057.codfw.wmnet 40.0.192.10.in-addr.arpa 0.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
13:56 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:56 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2057 - akosiaris@cumin1002"
13:56 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2057 - akosiaris@cumin1002"
13:55 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2059.codfw.wmnet with OS bullseye
13:53 akosiaris@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2059.codfw.wmnet with OS bullseye
13:53 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
13:53 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2059.codfw.wmnet with OS bullseye
13:53 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2058.codfw.wmnet with OS bullseye
13:52 akosiaris@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2057
13:52 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2057.codfw.wmnet with OS bullseye
13:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T370903)', diff saved to https://phabricator.wikimedia.org/P68309 and previous config saved to /var/cache/conftool/dbconfig/20240830-134954-ladsgroup.json
13:46 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2379 to wikikube-worker2059
13:45 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2059
13:45 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl2003.codfw.wmnet
13:45 jayme@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl2003.codfw.wmnet
13:45 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl2001.codfw.wmnet
13:45 jayme@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl2001.codfw.wmnet
13:45 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2059
13:45 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:45 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2379 to wikikube-worker2059 - akosiaris@cumin1002"
13:43 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
13:43 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
13:43 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-ctrl2001.codfw.wmnet
13:43 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-ctrl2001.codfw.wmnet
13:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P68308 and previous config saved to /var/cache/conftool/dbconfig/20240830-134257-ladsgroup.json
13:42 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2379 to wikikube-worker2059 - akosiaris@cumin1002"
13:41 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-ctrl2001.codfw.wmnet
13:41 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-ctrl2001.codfw.wmnet
13:40 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-ctrl2003.codfw.wmnet
13:40 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-ctrl2003.codfw.wmnet
13:38 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
13:38 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
13:35 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-ctrl2003.codfw.wmnet
13:35 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-ctrl2003.codfw.wmnet
13:34 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
13:34 akosiaris@cumin1002: START - Cookbook sre.hosts.rename from mw2379 to wikikube-worker2059
13:33 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2378 to wikikube-worker2058
13:33 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2058
13:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T370903)', diff saved to https://phabricator.wikimedia.org/P68307 and previous config saved to /var/cache/conftool/dbconfig/20240830-133201-ladsgroup.json
13:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
13:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
13:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T370903)', diff saved to https://phabricator.wikimedia.org/P68306 and previous config saved to /var/cache/conftool/dbconfig/20240830-133139-ladsgroup.json
13:31 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2058
13:31 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:31 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2378 to wikikube-worker2058 - akosiaris@cumin1002"
13:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T371742)', diff saved to https://phabricator.wikimedia.org/P68305 and previous config saved to /var/cache/conftool/dbconfig/20240830-132750-ladsgroup.json
13:27 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2378 to wikikube-worker2058 - akosiaris@cumin1002"
13:27 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host wikikube-worker2001.codfw.wmnet
13:27 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-worker2001.codfw.wmnet
13:26 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host wikikube-ctrl2003.codfw.wmnet
13:26 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-ctrl2003.codfw.wmnet
13:21 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
13:21 akosiaris@cumin1002: START - Cookbook sre.hosts.rename from mw2378 to wikikube-worker2058
13:21 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host wikikube-ctrl2003.codfw.wmnet
13:21 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-ctrl2003.codfw.wmnet
13:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P68304 and previous config saved to /var/cache/conftool/dbconfig/20240830-131631-ladsgroup.json
13:04 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2377 to wikikube-worker2057
13:04 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2057
13:04 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2057
13:04 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:04 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2377 to wikikube-worker2057 - akosiaris@cumin1002"
13:02 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2377 to wikikube-worker2057 - akosiaris@cumin1002"
13:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P68303 and previous config saved to /var/cache/conftool/dbconfig/20240830-130124-ladsgroup.json
12:59 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
12:59 akosiaris@cumin1002: START - Cookbook sre.hosts.rename from mw2377 to wikikube-worker2057
12:56 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2056.codfw.wmnet with OS bullseye
12:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T370903)', diff saved to https://phabricator.wikimedia.org/P68302 and previous config saved to /var/cache/conftool/dbconfig/20240830-124617-ladsgroup.json
12:37 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2056.codfw.wmnet with reason: host reimage
12:33 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2056.codfw.wmnet with reason: host reimage
12:27 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2055.codfw.wmnet
12:27 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2055.codfw.wmnet
12:25 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2055.codfw.wmnet with OS bullseye
12:24 hnowlan: homer 'lsw1-a3-codfw*' commit
12:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T371742)', diff saved to https://phabricator.wikimedia.org/P68301 and previous config saved to /var/cache/conftool/dbconfig/20240830-122139-ladsgroup.json
12:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
12:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
12:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T371742)', diff saved to https://phabricator.wikimedia.org/P68300 and previous config saved to /var/cache/conftool/dbconfig/20240830-122106-ladsgroup.json
12:20 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2379.codfw.wmnet
12:19 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2379.codfw.wmnet
12:19 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2378.codfw.wmnet
12:17 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2056
12:17 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2056
12:17 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2056
12:17 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2056.codfw.wmnet 45.0.192.10.in-addr.arpa 5.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
12:17 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2056.codfw.wmnet 45.0.192.10.in-addr.arpa 5.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
12:17 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:17 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2056 - hnowlan@cumin1002"
12:17 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2056 - hnowlan@cumin1002"
12:16 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2378.codfw.wmnet
12:16 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2377.codfw.wmnet
12:15 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2377.codfw.wmnet
12:13 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
12:13 hnowlan@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2056
12:13 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2056.codfw.wmnet with OS bullseye
12:12 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2056.codfw.wmnet on all recursors
12:12 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2056.codfw.wmnet on all recursors
12:11 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2055.codfw.wmnet on all recursors
12:11 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2055.codfw.wmnet on all recursors
12:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2382 to wikikube-worker2056
12:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2056
12:08 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2056
12:08 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:08 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2382 to wikikube-worker2056 - hnowlan@cumin1002"
12:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T370903)', diff saved to https://phabricator.wikimedia.org/P68299 and previous config saved to /var/cache/conftool/dbconfig/20240830-120742-ladsgroup.json
12:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
12:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
12:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T370903)', diff saved to https://phabricator.wikimedia.org/P68298 and previous config saved to /var/cache/conftool/dbconfig/20240830-120720-ladsgroup.json
12:06 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2055.codfw.wmnet with reason: host reimage
12:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P68297 and previous config saved to /var/cache/conftool/dbconfig/20240830-120559-ladsgroup.json
12:04 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2382 to wikikube-worker2056 - hnowlan@cumin1002"
12:02 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2055.codfw.wmnet with reason: host reimage
12:01 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
12:00 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2382 to wikikube-worker2056
11:57 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2382.codfw.wmnet
11:56 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2382.codfw.wmnet
11:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P68296 and previous config saved to /var/cache/conftool/dbconfig/20240830-115213-ladsgroup.json
11:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P68295 and previous config saved to /var/cache/conftool/dbconfig/20240830-115052-ladsgroup.json
11:46 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2055
11:46 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2055
11:46 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2055
11:46 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2055.codfw.wmnet 44.0.192.10.in-addr.arpa 4.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
11:46 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2055.codfw.wmnet 44.0.192.10.in-addr.arpa 4.4.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
11:46 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:46 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2055 - hnowlan@cumin1002"
11:46 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2055 - hnowlan@cumin1002"
11:42 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
11:42 hnowlan@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2055
11:41 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2055.codfw.wmnet with OS bullseye
11:40 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2381 to wikikube-worker2055
11:39 hnowlan@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2055
11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P68294 and previous config saved to /var/cache/conftool/dbconfig/20240830-113706-ladsgroup.json
11:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T371742)', diff saved to https://phabricator.wikimedia.org/P68293 and previous config saved to /var/cache/conftool/dbconfig/20240830-113544-ladsgroup.json
11:35 hnowlan@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2055
11:35 hnowlan@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:35 hnowlan@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2381 to wikikube-worker2055 - hnowlan@cumin2002"
11:34 hnowlan@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2381 to wikikube-worker2055 - hnowlan@cumin2002"
11:29 hnowlan@cumin2002: START - Cookbook sre.dns.netbox
11:28 hnowlan@cumin2002: START - Cookbook sre.hosts.rename from mw2381 to wikikube-worker2055
11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T370903)', diff saved to https://phabricator.wikimedia.org/P68292 and previous config saved to /var/cache/conftool/dbconfig/20240830-112159-ladsgroup.json
11:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T371742)', diff saved to https://phabricator.wikimedia.org/P68291 and previous config saved to /var/cache/conftool/dbconfig/20240830-110426-ladsgroup.json
11:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
11:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
11:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
11:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
11:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T370903)', diff saved to https://phabricator.wikimedia.org/P68290 and previous config saved to /var/cache/conftool/dbconfig/20240830-110334-ladsgroup.json
11:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
11:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
10:44 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2381.codfw.wmnet
10:44 Emperor: restart swift-proxy on ms-fe2009 and ms-fe2014 T360913
10:44 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2381.codfw.wmnet
10:27 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2054.codfw.wmnet
10:27 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2054.codfw.wmnet
10:27 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2053.codfw.wmnet
10:27 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2053.codfw.wmnet
10:27 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2052.codfw.wmnet
10:27 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2052.codfw.wmnet
10:06 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: sync
10:04 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: sync
09:59 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: sync
09:58 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/proton: sync
09:56 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: sync
09:55 elukey@deploy1003: helmfile [staging] START helmfile.d/services/proton: sync
09:51 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2054.codfw.wmnet with OS bullseye
09:45 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2053.codfw.wmnet with OS bullseye
09:43 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
09:43 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:42 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2052.codfw.wmnet with OS bullseye
09:42 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:39 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: sync
09:31 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2054.codfw.wmnet with reason: host reimage
09:27 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
09:27 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2054.codfw.wmnet with reason: host reimage
09:25 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2053.codfw.wmnet with reason: host reimage
09:22 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2053.codfw.wmnet with reason: host reimage
09:22 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2052.codfw.wmnet with reason: host reimage
09:21 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: sync
09:19 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2052.codfw.wmnet with reason: host reimage
09:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2054
09:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2054
09:10 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2054
09:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2054.codfw.wmnet 167.0.192.10.in-addr.arpa 7.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
09:10 akosiaris@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2054.codfw.wmnet 167.0.192.10.in-addr.arpa 7.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
09:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2054 - akosiaris@cumin1002"
09:10 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2054 - akosiaris@cumin1002"
09:07 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
09:07 akosiaris@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2054
09:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2053
09:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2053
09:06 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2053
09:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2053.codfw.wmnet 166.0.192.10.in-addr.arpa 6.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
09:06 akosiaris@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2053.codfw.wmnet 166.0.192.10.in-addr.arpa 6.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
09:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2053 - akosiaris@cumin1002"
09:06 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2053 - akosiaris@cumin1002"
09:04 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2054.codfw.wmnet with OS bullseye
09:03 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
09:03 akosiaris@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2053
09:03 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2052
09:03 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2052
09:03 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2052
09:02 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2052.codfw.wmnet 165.0.192.10.in-addr.arpa 5.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
09:02 akosiaris@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2052.codfw.wmnet 165.0.192.10.in-addr.arpa 5.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
09:02 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:02 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2052 - akosiaris@cumin1002"
09:02 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2052 - akosiaris@cumin1002"
09:02 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2053.codfw.wmnet with OS bullseye
08:59 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
08:59 akosiaris@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2052
08:59 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2052.codfw.wmnet with OS bullseye
08:56 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2297 to wikikube-worker2054
08:55 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2054
08:55 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2054
08:55 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:55 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2297 to wikikube-worker2054 - akosiaris@cumin1002"
08:52 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2297 to wikikube-worker2054 - akosiaris@cumin1002"
08:50 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@3d18901] (releasing): (no justification provided) (duration: 00m 41s)
08:50 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: sync
08:50 elukey@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: sync
08:50 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@3d18901] (releasing): (no justification provided)
08:48 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@3d18901] (releasing): (no justification provided) (duration: 00m 20s)
08:47 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@3d18901] (releasing): (no justification provided)
08:37 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
08:37 akosiaris@cumin1002: START - Cookbook sre.hosts.rename from mw2297 to wikikube-worker2054
08:36 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2296 to wikikube-worker2053
08:36 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2053
08:36 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2053
08:36 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:36 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2296 to wikikube-worker2053 - akosiaris@cumin1002"
08:35 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2296 to wikikube-worker2053 - akosiaris@cumin1002"
08:26 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
08:26 akosiaris@cumin1002: START - Cookbook sre.hosts.rename from mw2296 to wikikube-worker2053
08:24 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2295 to wikikube-worker2052
08:24 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2052
08:23 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2052
08:23 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:23 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2295 to wikikube-worker2052 - akosiaris@cumin1002"
08:23 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2295 to wikikube-worker2052 - akosiaris@cumin1002"
07:36 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
07:36 akosiaris@cumin1002: START - Cookbook sre.hosts.rename from mw2295 to wikikube-worker2052
07:22 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 52965
07:22 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 52965
07:11 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
07:11 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
07:11 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
07:11 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
07:11 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
07:10 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
07:10 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
07:10 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
07:10 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
07:10 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
07:10 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
07:09 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
07:09 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
07:09 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
07:09 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
07:09 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
07:09 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
07:08 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
05:35 kcvelaga@deploy1003: Finished deploy [airflow-dags/analytics_product@0321fda]: (no justification provided) (duration: 00m 32s)
05:34 kcvelaga@deploy1003: Started deploy [airflow-dags/analytics_product@0321fda]: (no justification provided)
04:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T370903)', diff saved to https://phabricator.wikimedia.org/P68289 and previous config saved to /var/cache/conftool/dbconfig/20240830-045519-ladsgroup.json
04:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P68288 and previous config saved to /var/cache/conftool/dbconfig/20240830-044012-ladsgroup.json
04:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P68287 and previous config saved to /var/cache/conftool/dbconfig/20240830-042505-ladsgroup.json
04:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T370903)', diff saved to https://phabricator.wikimedia.org/P68286 and previous config saved to /var/cache/conftool/dbconfig/20240830-040957-ladsgroup.json
04:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T370903)', diff saved to https://phabricator.wikimedia.org/P68285 and previous config saved to /var/cache/conftool/dbconfig/20240830-040055-ladsgroup.json
04:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2207.codfw.wmnet with reason: Maintenance
04:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2207.codfw.wmnet with reason: Maintenance
03:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2197.codfw.wmnet with reason: Maintenance
03:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2197.codfw.wmnet with reason: Maintenance
03:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T370903)', diff saved to https://phabricator.wikimedia.org/P68284 and previous config saved to /var/cache/conftool/dbconfig/20240830-035123-ladsgroup.json
03:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P68283 and previous config saved to /var/cache/conftool/dbconfig/20240830-033616-ladsgroup.json
03:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P68282 and previous config saved to /var/cache/conftool/dbconfig/20240830-032109-ladsgroup.json
03:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T370903)', diff saved to https://phabricator.wikimedia.org/P68281 and previous config saved to /var/cache/conftool/dbconfig/20240830-030602-ladsgroup.json
02:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T370903)', diff saved to https://phabricator.wikimedia.org/P68280 and previous config saved to /var/cache/conftool/dbconfig/20240830-025809-ladsgroup.json
02:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2189.codfw.wmnet with reason: Maintenance
02:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2189.codfw.wmnet with reason: Maintenance
02:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T370903)', diff saved to https://phabricator.wikimedia.org/P68279 and previous config saved to /var/cache/conftool/dbconfig/20240830-025747-ladsgroup.json
02:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P68278 and previous config saved to /var/cache/conftool/dbconfig/20240830-024239-ladsgroup.json
02:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P68277 and previous config saved to /var/cache/conftool/dbconfig/20240830-022732-ladsgroup.json
02:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T370903)', diff saved to https://phabricator.wikimedia.org/P68276 and previous config saved to /var/cache/conftool/dbconfig/20240830-021225-ladsgroup.json
02:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T371742)', diff saved to https://phabricator.wikimedia.org/P68275 and previous config saved to /var/cache/conftool/dbconfig/20240830-020606-ladsgroup.json
02:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T370903)', diff saved to https://phabricator.wikimedia.org/P68274 and previous config saved to /var/cache/conftool/dbconfig/20240830-020305-ladsgroup.json
02:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2175.codfw.wmnet with reason: Maintenance
02:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2175.codfw.wmnet with reason: Maintenance
02:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T370903)', diff saved to https://phabricator.wikimedia.org/P68273 and previous config saved to /var/cache/conftool/dbconfig/20240830-020243-ladsgroup.json
01:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P68272 and previous config saved to /var/cache/conftool/dbconfig/20240830-015059-ladsgroup.json
01:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P68271 and previous config saved to /var/cache/conftool/dbconfig/20240830-014736-ladsgroup.json
01:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P68270 and previous config saved to /var/cache/conftool/dbconfig/20240830-013551-ladsgroup.json
01:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P68269 and previous config saved to /var/cache/conftool/dbconfig/20240830-013229-ladsgroup.json
01:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T371742)', diff saved to https://phabricator.wikimedia.org/P68268 and previous config saved to /var/cache/conftool/dbconfig/20240830-012044-ladsgroup.json
01:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T370903)', diff saved to https://phabricator.wikimedia.org/P68267 and previous config saved to /var/cache/conftool/dbconfig/20240830-011721-ladsgroup.json
01:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T370903)', diff saved to https://phabricator.wikimedia.org/P68266 and previous config saved to /var/cache/conftool/dbconfig/20240830-010823-ladsgroup.json
01:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2148.codfw.wmnet with reason: Maintenance
01:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2148.codfw.wmnet with reason: Maintenance
01:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T370903)', diff saved to https://phabricator.wikimedia.org/P68265 and previous config saved to /var/cache/conftool/dbconfig/20240830-010801-ladsgroup.json
00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2213 (T371742)', diff saved to https://phabricator.wikimedia.org/P68264 and previous config saved to /var/cache/conftool/dbconfig/20240830-005534-ladsgroup.json
00:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance
00:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance
00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T371742)', diff saved to https://phabricator.wikimedia.org/P68263 and previous config saved to /var/cache/conftool/dbconfig/20240830-005512-ladsgroup.json
00:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P68262 and previous config saved to /var/cache/conftool/dbconfig/20240830-005254-ladsgroup.json
00:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P68261 and previous config saved to /var/cache/conftool/dbconfig/20240830-004004-ladsgroup.json
00:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P68260 and previous config saved to /var/cache/conftool/dbconfig/20240830-003746-ladsgroup.json
00:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P68259 and previous config saved to /var/cache/conftool/dbconfig/20240830-002457-ladsgroup.json
00:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T370903)', diff saved to https://phabricator.wikimedia.org/P68258 and previous config saved to /var/cache/conftool/dbconfig/20240830-002239-ladsgroup.json
00:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T370903)', diff saved to https://phabricator.wikimedia.org/P68255 and previous config saved to /var/cache/conftool/dbconfig/20240830-001353-ladsgroup.json
00:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
00:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
00:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T370903)', diff saved to https://phabricator.wikimedia.org/P68254 and previous config saved to /var/cache/conftool/dbconfig/20240830-001331-ladsgroup.json
00:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T371742)', diff saved to https://phabricator.wikimedia.org/P68253 and previous config saved to /var/cache/conftool/dbconfig/20240830-000950-ladsgroup.json

2024-08-29

23:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P68252 and previous config saved to /var/cache/conftool/dbconfig/20240829-235824-ladsgroup.json
23:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T371742)', diff saved to https://phabricator.wikimedia.org/P68251 and previous config saved to /var/cache/conftool/dbconfig/20240829-234420-ladsgroup.json
23:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2211.codfw.wmnet with reason: Maintenance
23:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2211.codfw.wmnet with reason: Maintenance
23:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P68250 and previous config saved to /var/cache/conftool/dbconfig/20240829-234317-ladsgroup.json
23:33 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1001.eqiad.wmnet with OS bookworm
23:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T370903)', diff saved to https://phabricator.wikimedia.org/P68249 and previous config saved to /var/cache/conftool/dbconfig/20240829-232810-ladsgroup.json
23:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T370903)', diff saved to https://phabricator.wikimedia.org/P68248 and previous config saved to /var/cache/conftool/dbconfig/20240829-232548-ladsgroup.json
23:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
23:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
23:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2126.codfw.wmnet with reason: Maintenance
23:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2126.codfw.wmnet with reason: Maintenance
23:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T370903)', diff saved to https://phabricator.wikimedia.org/P68247 and previous config saved to /var/cache/conftool/dbconfig/20240829-232510-ladsgroup.json
23:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2201.codfw.wmnet with reason: Maintenance
23:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2201.codfw.wmnet with reason: Maintenance
23:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T371742)', diff saved to https://phabricator.wikimedia.org/P68246 and previous config saved to /var/cache/conftool/dbconfig/20240829-232124-ladsgroup.json
23:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P68245 and previous config saved to /var/cache/conftool/dbconfig/20240829-231003-ladsgroup.json
23:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P68244 and previous config saved to /var/cache/conftool/dbconfig/20240829-230616-ladsgroup.json
22:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P68243 and previous config saved to /var/cache/conftool/dbconfig/20240829-225456-ladsgroup.json
22:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P68242 and previous config saved to /var/cache/conftool/dbconfig/20240829-225109-ladsgroup.json
22:45 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1001.eqiad.wmnet with OS bookworm
22:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1001.eqiad.wmnet with OS bookworm
22:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T370903)', diff saved to https://phabricator.wikimedia.org/P68241 and previous config saved to /var/cache/conftool/dbconfig/20240829-223949-ladsgroup.json
22:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T371742)', diff saved to https://phabricator.wikimedia.org/P68240 and previous config saved to /var/cache/conftool/dbconfig/20240829-223602-ladsgroup.json
22:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T370903)', diff saved to https://phabricator.wikimedia.org/P68239 and previous config saved to /var/cache/conftool/dbconfig/20240829-222824-ladsgroup.json
22:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2125.codfw.wmnet with reason: Maintenance
22:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2125.codfw.wmnet with reason: Maintenance
22:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T370903)', diff saved to https://phabricator.wikimedia.org/P68238 and previous config saved to /var/cache/conftool/dbconfig/20240829-222048-ladsgroup.json
22:19 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bookworm
22:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T371742)', diff saved to https://phabricator.wikimedia.org/P68237 and previous config saved to /var/cache/conftool/dbconfig/20240829-221559-ladsgroup.json
22:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2192.codfw.wmnet with reason: Maintenance
22:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2192.codfw.wmnet with reason: Maintenance
22:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T371742)', diff saved to https://phabricator.wikimedia.org/P68236 and previous config saved to /var/cache/conftool/dbconfig/20240829-221537-ladsgroup.json
22:10 zabe: zabe@mwmaint1002:~$ mwscript extensions/WikimediaMaintenance/migrateESRefToContentTable.php testwiki # T183490
22:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P68235 and previous config saved to /var/cache/conftool/dbconfig/20240829-220541-ladsgroup.json
22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P68234 and previous config saved to /var/cache/conftool/dbconfig/20240829-220030-ladsgroup.json
21:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1001.eqiad.wmnet with OS bookworm
21:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1001.eqiad.wmnet with OS bookworm
21:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P68233 and previous config saved to /var/cache/conftool/dbconfig/20240829-215034-ladsgroup.json
21:50 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P68232 and previous config saved to /var/cache/conftool/dbconfig/20240829-214523-ladsgroup.json
21:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
21:45 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
21:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T370903)', diff saved to https://phabricator.wikimedia.org/P68231 and previous config saved to /var/cache/conftool/dbconfig/20240829-213526-ladsgroup.json
21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T371742)', diff saved to https://phabricator.wikimedia.org/P68230 and previous config saved to /var/cache/conftool/dbconfig/20240829-213015-ladsgroup.json
21:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T370903)', diff saved to https://phabricator.wikimedia.org/P68229 and previous config saved to /var/cache/conftool/dbconfig/20240829-212727-ladsgroup.json
21:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1246.eqiad.wmnet with reason: Maintenance
21:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1246.eqiad.wmnet with reason: Maintenance
21:24 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
21:19 cmooney@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2002.codfw.wmnet with OS bookworm
21:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1239.eqiad.wmnet with reason: Maintenance
21:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1239.eqiad.wmnet with reason: Maintenance
21:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T370903)', diff saved to https://phabricator.wikimedia.org/P68228 and previous config saved to /var/cache/conftool/dbconfig/20240829-211642-ladsgroup.json
21:10 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1009.eqiad.wmnet with reason: host reimage
21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T371742)', diff saved to https://phabricator.wikimedia.org/P68227 and previous config saved to /var/cache/conftool/dbconfig/20240829-210822-ladsgroup.json
21:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
21:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T371742)', diff saved to https://phabricator.wikimedia.org/P68226 and previous config saved to /var/cache/conftool/dbconfig/20240829-210759-ladsgroup.json
21:07 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1009.eqiad.wmnet with reason: host reimage
21:04 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
21:03 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1001.eqiad.wmnet with OS bookworm
21:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P68225 and previous config saved to /var/cache/conftool/dbconfig/20240829-210135-ladsgroup.json
20:56 urbanecm@deploy1003: Finished scap sync-world: Backport for Turn on Parsoid Read Views for eo/sv/fi wikivoyage (T372810), Add project talk aliases for mnwiki (T366271) (duration: 13m 16s)
20:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
20:54 eileen: civicrm upgraded from 916cad45 to 27b1f673
20:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P68224 and previous config saved to /var/cache/conftool/dbconfig/20240829-205252-ladsgroup.json
20:51 urbanecm@deploy1003: urbanecm, srishakatux, cscott: Continuing with sync
20:49 cmooney@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2002.codfw.wmnet with OS bookworm
20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P68223 and previous config saved to /var/cache/conftool/dbconfig/20240829-204628-ladsgroup.json
20:44 urbanecm@deploy1003: urbanecm, srishakatux, cscott: Backport for Turn on Parsoid Read Views for eo/sv/fi wikivoyage (T372810), Add project talk aliases for mnwiki (T366271) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:42 urbanecm@deploy1003: Started scap sync-world: Backport for Turn on Parsoid Read Views for eo/sv/fi wikivoyage (T372810), Add project talk aliases for mnwiki (T366271)
20:42 urbanecm@deploy1003: Finished scap sync-world: Backport for kuswiki: add custom logos (T368868), bewwiki: add custom logos (T368868), Enable AutoModerator on id.wiki (T365792) (duration: 07m 50s)
20:38 urbanecm@deploy1003: kgraessle, urbanecm, chlod: Continuing with sync
20:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P68222 and previous config saved to /var/cache/conftool/dbconfig/20240829-203745-ladsgroup.json
20:36 urbanecm@deploy1003: kgraessle, urbanecm, chlod: Backport for kuswiki: add custom logos (T368868), bewwiki: add custom logos (T368868), Enable AutoModerator on id.wiki (T365792) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:34 urbanecm@deploy1003: Started scap sync-world: Backport for kuswiki: add custom logos (T368868), bewwiki: add custom logos (T368868), Enable AutoModerator on id.wiki (T365792)
20:33 urbanecm@deploy1003: Finished scap sync-world: Backport for kawikisource: re-add custom logos (T368868), kaawiktionary: re-add custom logos (T368868), iglwiki: add custom logos (T368868), mywikisource: add custom logos (T368868) (duration: 10m 48s)
20:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T370903)', diff saved to https://phabricator.wikimedia.org/P68221 and previous config saved to /var/cache/conftool/dbconfig/20240829-203120-ladsgroup.json
20:28 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
20:28 urbanecm@deploy1003: urbanecm, chlod: Continuing with sync
20:28 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
20:26 urbanecm@deploy1003: urbanecm, chlod: Backport for kawikisource: re-add custom logos (T368868), kaawiktionary: re-add custom logos (T368868), iglwiki: add custom logos (T368868), mywikisource: add custom logos (T368868) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:23 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
20:22 urbanecm@deploy1003: Started scap sync-world: Backport for kawikisource: re-add custom logos (T368868), kaawiktionary: re-add custom logos (T368868), iglwiki: add custom logos (T368868), mywikisource: add custom logos (T368868)
20:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T371742)', diff saved to https://phabricator.wikimedia.org/P68220 and previous config saved to /var/cache/conftool/dbconfig/20240829-202238-ladsgroup.json
20:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T370903)', diff saved to https://phabricator.wikimedia.org/P68219 and previous config saved to /var/cache/conftool/dbconfig/20240829-202231-ladsgroup.json
20:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1233.eqiad.wmnet with reason: Maintenance
20:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1233.eqiad.wmnet with reason: Maintenance
20:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T370903)', diff saved to https://phabricator.wikimedia.org/P68218 and previous config saved to /var/cache/conftool/dbconfig/20240829-202209-ladsgroup.json
20:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2002 - cmooney@cumin1002"
20:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2002 - cmooney@cumin1002"
20:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox
20:17 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest2002.codfw.wmnet
20:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P68217 and previous config saved to /var/cache/conftool/dbconfig/20240829-200701-ladsgroup.json
19:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T371742)', diff saved to https://phabricator.wikimedia.org/P68216 and previous config saved to /var/cache/conftool/dbconfig/20240829-195609-ladsgroup.json
19:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
19:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
19:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T371742)', diff saved to https://phabricator.wikimedia.org/P68215 and previous config saved to /var/cache/conftool/dbconfig/20240829-195547-ladsgroup.json
19:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P68214 and previous config saved to /var/cache/conftool/dbconfig/20240829-195154-ladsgroup.json
19:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P68213 and previous config saved to /var/cache/conftool/dbconfig/20240829-194040-ladsgroup.json
19:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T370903)', diff saved to https://phabricator.wikimedia.org/P68212 and previous config saved to /var/cache/conftool/dbconfig/20240829-193647-ladsgroup.json
19:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T370903)', diff saved to https://phabricator.wikimedia.org/P68211 and previous config saved to /var/cache/conftool/dbconfig/20240829-193436-ladsgroup.json
19:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1229.eqiad.wmnet with reason: Maintenance
19:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1229.eqiad.wmnet with reason: Maintenance
19:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P68210 and previous config saved to /var/cache/conftool/dbconfig/20240829-192533-ladsgroup.json
19:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1225.eqiad.wmnet with reason: Maintenance
19:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1225.eqiad.wmnet with reason: Maintenance
19:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T370903)', diff saved to https://phabricator.wikimedia.org/P68209 and previous config saved to /var/cache/conftool/dbconfig/20240829-192409-ladsgroup.json
19:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T371742)', diff saved to https://phabricator.wikimedia.org/P68208 and previous config saved to /var/cache/conftool/dbconfig/20240829-191026-ladsgroup.json
19:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P68207 and previous config saved to /var/cache/conftool/dbconfig/20240829-190902-ladsgroup.json
19:06 cmooney@cumin1002: START - Cookbook sre.hosts.dhcp for host sretest2002.codfw.wmnet
18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P68206 and previous config saved to /var/cache/conftool/dbconfig/20240829-185355-ladsgroup.json
18:52 kamila@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2051.codfw.wmnet
18:52 kamila@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2051.codfw.wmnet
18:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T371742)', diff saved to https://phabricator.wikimedia.org/P68205 and previous config saved to /var/cache/conftool/dbconfig/20240829-184242-ladsgroup.json
18:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
18:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
18:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T371742)', diff saved to https://phabricator.wikimedia.org/P68204 and previous config saved to /var/cache/conftool/dbconfig/20240829-184220-ladsgroup.json
18:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T370903)', diff saved to https://phabricator.wikimedia.org/P68203 and previous config saved to /var/cache/conftool/dbconfig/20240829-183848-ladsgroup.json
18:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T370903)', diff saved to https://phabricator.wikimedia.org/P68202 and previous config saved to /var/cache/conftool/dbconfig/20240829-183638-ladsgroup.json
18:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1197.eqiad.wmnet with reason: Maintenance
18:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1197.eqiad.wmnet with reason: Maintenance
18:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T370903)', diff saved to https://phabricator.wikimedia.org/P68201 and previous config saved to /var/cache/conftool/dbconfig/20240829-183616-ladsgroup.json
18:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P68200 and previous config saved to /var/cache/conftool/dbconfig/20240829-182713-ladsgroup.json
18:24 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@abb06c4]: Deploy latest Analitycs Airflow DAGs to pickup T373402 (duration: 00m 42s)
18:23 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@abb06c4]: Deploy latest Analitycs Airflow DAGs to pickup T373402
18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P68199 and previous config saved to /var/cache/conftool/dbconfig/20240829-182108-ladsgroup.json
18:15 kamila_: running homer after wikikube-worker2051 rename
18:12 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2051.codfw.wmnet with OS bullseye
18:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P68198 and previous config saved to /var/cache/conftool/dbconfig/20240829-181205-ladsgroup.json
18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P68197 and previous config saved to /var/cache/conftool/dbconfig/20240829-180601-ladsgroup.json
17:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T371742)', diff saved to https://phabricator.wikimedia.org/P68196 and previous config saved to /var/cache/conftool/dbconfig/20240829-175658-ladsgroup.json
17:51 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2051.codfw.wmnet with reason: host reimage
17:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T370903)', diff saved to https://phabricator.wikimedia.org/P68195 and previous config saved to /var/cache/conftool/dbconfig/20240829-175053-ladsgroup.json
17:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T370903)', diff saved to https://phabricator.wikimedia.org/P68194 and previous config saved to /var/cache/conftool/dbconfig/20240829-174842-ladsgroup.json
17:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1188.eqiad.wmnet with reason: Maintenance
17:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1188.eqiad.wmnet with reason: Maintenance
17:48 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2051.codfw.wmnet with reason: host reimage
17:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T370903)', diff saved to https://phabricator.wikimedia.org/P68193 and previous config saved to /var/cache/conftool/dbconfig/20240829-174820-ladsgroup.json
17:39 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
17:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T371742)', diff saved to https://phabricator.wikimedia.org/P68192 and previous config saved to /var/cache/conftool/dbconfig/20240829-173416-ladsgroup.json
17:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
17:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
17:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
17:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
17:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P68191 and previous config saved to /var/cache/conftool/dbconfig/20240829-173313-ladsgroup.json
17:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T367856)', diff saved to https://phabricator.wikimedia.org/P68190 and previous config saved to /var/cache/conftool/dbconfig/20240829-173303-marostegui.json
17:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 7:00:00 on db2167.codfw.wmnet with reason: Maintenance
17:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 7:00:00 on db2167.codfw.wmnet with reason: Maintenance
17:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T367856)', diff saved to https://phabricator.wikimedia.org/P68189 and previous config saved to /var/cache/conftool/dbconfig/20240829-173240-marostegui.json
17:32 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2051
17:32 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2051
17:31 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2051
17:31 kamila@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2051.codfw.wmnet 65.0.192.10.in-addr.arpa 5.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
17:31 kamila@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2051.codfw.wmnet 65.0.192.10.in-addr.arpa 5.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
17:31 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:31 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2051 - kamila@cumin1002"
17:31 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2051 - kamila@cumin1002"
17:28 kamila@cumin1002: START - Cookbook sre.dns.netbox
17:27 kamila@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2051
17:27 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2051.codfw.wmnet with OS bullseye
17:27 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2401 to wikikube-worker2051
17:26 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2051
17:26 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2051
17:26 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:26 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2401 to wikikube-worker2051 - kamila@cumin1002"
17:25 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2401 to wikikube-worker2051 - kamila@cumin1002"
17:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
17:21 kamila@cumin1002: START - Cookbook sre.dns.netbox
17:21 kamila@cumin1002: START - Cookbook sre.hosts.rename from mw2401 to wikikube-worker2051
17:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P68188 and previous config saved to /var/cache/conftool/dbconfig/20240829-171759-ladsgroup.json
17:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P68187 and previous config saved to /var/cache/conftool/dbconfig/20240829-171733-marostegui.json
17:17 kamila@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) depool for host mw2401.codfw.wmnet
17:16 kamila@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2401.codfw.wmnet
17:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
17:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
17:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T370903)', diff saved to https://phabricator.wikimedia.org/P68186 and previous config saved to /var/cache/conftool/dbconfig/20240829-170252-ladsgroup.json
17:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P68185 and previous config saved to /var/cache/conftool/dbconfig/20240829-170224-marostegui.json
16:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T370903)', diff saved to https://phabricator.wikimedia.org/P68184 and previous config saved to /var/cache/conftool/dbconfig/20240829-165341-ladsgroup.json
16:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T370903)', diff saved to https://phabricator.wikimedia.org/P68183 and previous config saved to /var/cache/conftool/dbconfig/20240829-165319-ladsgroup.json
16:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
16:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
16:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T367856)', diff saved to https://phabricator.wikimedia.org/P68182 and previous config saved to /var/cache/conftool/dbconfig/20240829-164717-marostegui.json
16:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P68180 and previous config saved to /var/cache/conftool/dbconfig/20240829-163811-ladsgroup.json
16:27 topranks: update qos configuration for asw2-ulsfo to use traffic-control profile T373594
16:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
16:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
16:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T371742)', diff saved to https://phabricator.wikimedia.org/P68179 and previous config saved to /var/cache/conftool/dbconfig/20240829-162601-ladsgroup.json
16:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P68178 and previous config saved to /var/cache/conftool/dbconfig/20240829-162304-ladsgroup.json
16:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P68177 and previous config saved to /var/cache/conftool/dbconfig/20240829-161054-ladsgroup.json
16:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T370903)', diff saved to https://phabricator.wikimedia.org/P68176 and previous config saved to /var/cache/conftool/dbconfig/20240829-160757-ladsgroup.json
16:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T370903)', diff saved to https://phabricator.wikimedia.org/P68175 and previous config saved to /var/cache/conftool/dbconfig/20240829-160447-ladsgroup.json
16:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
16:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
16:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T370903)', diff saved to https://phabricator.wikimedia.org/P68174 and previous config saved to /var/cache/conftool/dbconfig/20240829-160425-ladsgroup.json
16:04 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1001.eqiad.wmnet with OS bookworm
15:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P68173 and previous config saved to /var/cache/conftool/dbconfig/20240829-155547-ladsgroup.json
15:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P68172 and previous config saved to /var/cache/conftool/dbconfig/20240829-155431-ladsgroup.json
15:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P68171 and previous config saved to /var/cache/conftool/dbconfig/20240829-154917-ladsgroup.json
15:42 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4010.ulsfo.wmnet
15:42 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs4010.ulsfo.wmnet
15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T371742)', diff saved to https://phabricator.wikimedia.org/P68170 and previous config saved to /var/cache/conftool/dbconfig/20240829-154040-ladsgroup.json
15:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P68169 and previous config saved to /var/cache/conftool/dbconfig/20240829-153925-ladsgroup.json
15:39 sukhe: re-enable puppet on lvs4010
15:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
15:35 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
15:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P68168 and previous config saved to /var/cache/conftool/dbconfig/20240829-153410-ladsgroup.json
15:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
15:33 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
15:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
15:30 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
15:29 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
15:29 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
15:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P68167 and previous config saved to /var/cache/conftool/dbconfig/20240829-152419-ladsgroup.json
15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T371742)', diff saved to https://phabricator.wikimedia.org/P68166 and previous config saved to /var/cache/conftool/dbconfig/20240829-152058-ladsgroup.json
15:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
15:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T371742)', diff saved to https://phabricator.wikimedia.org/P68165 and previous config saved to /var/cache/conftool/dbconfig/20240829-152036-ladsgroup.json
15:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T370903)', diff saved to https://phabricator.wikimedia.org/P68164 and previous config saved to /var/cache/conftool/dbconfig/20240829-151903-ladsgroup.json
15:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-lab1002.eqiad.wmnet with OS bookworm
15:16 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:16 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:11 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
15:10 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
15:10 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
15:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T370903)', diff saved to https://phabricator.wikimedia.org/P68163 and previous config saved to /var/cache/conftool/dbconfig/20240829-151000-ladsgroup.json
15:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
15:09 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
15:09 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
15:09 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
15:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
15:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
15:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
15:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T370903)', diff saved to https://phabricator.wikimedia.org/P68162 and previous config saved to /var/cache/conftool/dbconfig/20240829-150846-ladsgroup.json
15:08 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2297.codfw.wmnet
15:07 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2297.codfw.wmnet
15:07 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2296.codfw.wmnet
15:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1001.eqiad.wmnet with OS bookworm
15:07 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2296.codfw.wmnet
15:07 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2295.codfw.wmnet
15:07 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet
15:06 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2295.codfw.wmnet
15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P68161 and previous config saved to /var/cache/conftool/dbconfig/20240829-150529-ladsgroup.json
15:04 mutante: releases* - temp disable puppet, maintenance for java version upgrade
15:04 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet
15:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-sd1003.eqiad.wmnet with OS bookworm
15:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:03 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-lab1002.eqiad.wmnet with reason: host reimage
14:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-sd1002.eqiad.wmnet with OS bookworm
14:59 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:59 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:58 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-lab1002.eqiad.wmnet with reason: host reimage
14:56 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: testing T358260
14:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-sd1004.eqiad.wmnet with OS bookworm
14:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:56 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: testing T358260
14:56 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2014.codfw.wmnet
14:56 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2014.codfw.wmnet
14:56 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2014.codfw.wmnet with reason: testing T358260
14:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:55 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2014.codfw.wmnet with reason: testing T358260
14:55 sukhe: downtiming lvs4010 to test T358260
14:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-sd1001.eqiad.wmnet with OS bookworm
14:53 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:52 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P68160 and previous config saved to /var/cache/conftool/dbconfig/20240829-145021-ladsgroup.json
14:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-sd1003.eqiad.wmnet with reason: host reimage
14:42 jgiannelos@deploy1003: Finished deploy [restbase/deploy@5a4727a]: (no justification provided) (duration: 16m 35s)
14:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-sd1002.eqiad.wmnet with reason: host reimage
14:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1002.eqiad.wmnet with OS bookworm
14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
14:38 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1002.eqiad.wmnet with OS bookworm
14:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-sd1004.eqiad.wmnet with reason: host reimage
14:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-sd1001.eqiad.wmnet with reason: host reimage
14:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1001.eqiad.wmnet with OS bookworm
14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T371742)', diff saved to https://phabricator.wikimedia.org/P68159 and previous config saved to /var/cache/conftool/dbconfig/20240829-143514-ladsgroup.json
14:32 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-sd1004.eqiad.wmnet with reason: host reimage
14:32 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-sd1003.eqiad.wmnet with reason: host reimage
14:32 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-sd1002.eqiad.wmnet with reason: host reimage
14:32 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-sd1001.eqiad.wmnet with reason: host reimage
14:25 jgiannelos@deploy1003: Started deploy [restbase/deploy@5a4727a]: (no justification provided)
13:59 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2050.codfw.wmnet
13:59 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2050.codfw.wmnet
13:55 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P68154 and previous config saved to /var/cache/conftool/dbconfig/20240829-135537-ladsgroup.json
13:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
13:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
13:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P68153 and previous config saved to /var/cache/conftool/dbconfig/20240829-135430-ladsgroup.json
13:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1002.eqiad.wmnet with OS bookworm
13:52 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1002.eqiad.wmnet with OS bookworm
13:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host logging-sd1004.mgmt.eqiad.wmnet with reboot policy FORCED
13:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host logging-sd1003.mgmt.eqiad.wmnet with reboot policy FORCED
13:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host logging-sd1002.mgmt.eqiad.wmnet with reboot policy FORCED
13:50 topranks: add qos interface schedulers on lsw1-d4-codfw T339850
13:50 jclark@cumin1002: START - Cookbook sre.hosts.provision for host logging-sd1001.mgmt.eqiad.wmnet with reboot policy FORCED
13:49 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:49 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt logging-sd1 - jclark@cumin1002"
13:49 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt logging-sd1 - jclark@cumin1002"
13:44 jclark@cumin1002: START - Cookbook sre.dns.netbox
13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P68152 and previous config saved to /var/cache/conftool/dbconfig/20240829-134030-ladsgroup.json
13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P68151 and previous config saved to /var/cache/conftool/dbconfig/20240829-133923-ladsgroup.json
13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1002.eqiad.wmnet with OS bookworm
13:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1001.eqiad.wmnet with OS bookworm
13:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T370903)', diff saved to https://phabricator.wikimedia.org/P68150 and previous config saved to /var/cache/conftool/dbconfig/20240829-132523-ladsgroup.json
13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T371742)', diff saved to https://phabricator.wikimedia.org/P68149 and previous config saved to /var/cache/conftool/dbconfig/20240829-132416-ladsgroup.json
13:13 samtar@deploy1003: Finished scap sync-world: Backport for Activate feature flag for moving wikibase item to Other Projects sidebar in pilot wikis. (duration: 10m 28s)
13:08 samtar@deploy1003: joelyrookewmde, samtar: Continuing with sync
13:06 samtar@deploy1003: joelyrookewmde, samtar: Backport for Activate feature flag for moving wikibase item to Other Projects sidebar in pilot wikis. synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:02 samtar@deploy1003: Started scap sync-world: Backport for Activate feature flag for moving wikibase item to Other Projects sidebar in pilot wikis.
13:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T371742)', diff saved to https://phabricator.wikimedia.org/P68148 and previous config saved to /var/cache/conftool/dbconfig/20240829-130029-ladsgroup.json
13:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
13:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
13:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T371742)', diff saved to https://phabricator.wikimedia.org/P68147 and previous config saved to /var/cache/conftool/dbconfig/20240829-130006-ladsgroup.json
12:51 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@cb0bc4d]: Test Refine through Airflow (duration: 00m 09s)
12:51 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@cb0bc4d]: Test Refine through Airflow
12:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P68146 and previous config saved to /var/cache/conftool/dbconfig/20240829-124459-ladsgroup.json
12:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P68145 and previous config saved to /var/cache/conftool/dbconfig/20240829-122951-ladsgroup.json
12:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T370903)', diff saved to https://phabricator.wikimedia.org/P68144 and previous config saved to /var/cache/conftool/dbconfig/20240829-122527-ladsgroup.json
12:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1172.eqiad.wmnet with reason: Maintenance
12:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1172.eqiad.wmnet with reason: Maintenance
12:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from test-s4 to test-s4
12:22 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from test-s4 to test-s4
12:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
12:20 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
12:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T371742)', diff saved to https://phabricator.wikimedia.org/P68143 and previous config saved to /var/cache/conftool/dbconfig/20240829-121444-ladsgroup.json
12:10 hnowlan: homer 'lsw1-a3-codfw*' commit
12:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.renumber-node (exit_code=0) Renumbering for host wikikube-worker2031.codfw.wmnet
12:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2031.codfw.wmnet
12:00 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2031.codfw.wmnet
11:56 claime: homer lsw1-a6-codfw* commit 'T372878'
11:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2031.codfw.wmnet with OS bullseye
11:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T371742)', diff saved to https://phabricator.wikimedia.org/P68142 and previous config saved to /var/cache/conftool/dbconfig/20240829-115222-ladsgroup.json
11:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
11:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from test-s4 to test-s4
11:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
11:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T371742)', diff saved to https://phabricator.wikimedia.org/P68141 and previous config saved to /var/cache/conftool/dbconfig/20240829-115200-ladsgroup.json
11:51 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2050.codfw.wmnet with OS bullseye
11:51 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from test-s4 to test-s4
11:51 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
11:51 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
11:44 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
11:43 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
11:41 topranks: modify qos configuration for asw2-ulsfo xe-2/0/18 (ganeti4006) to add traffic-control-profile T339850
11:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
11:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P68140 and previous config saved to /var/cache/conftool/dbconfig/20240829-113652-ladsgroup.json
11:35 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2031.codfw.wmnet with reason: host reimage
11:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
11:34 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
11:32 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2050.codfw.wmnet with reason: host reimage
11:32 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
11:32 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
11:32 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
11:32 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
11:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
11:31 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
11:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
11:31 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
11:30 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2031.codfw.wmnet with reason: host reimage
11:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
11:30 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
11:29 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2050.codfw.wmnet with reason: host reimage
11:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
11:24 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
11:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
11:22 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
11:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P68139 and previous config saved to /var/cache/conftool/dbconfig/20240829-112145-ladsgroup.json
11:17 claime: homer cr*codfw* commit 'T372878'
11:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
11:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
11:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T370903)', diff saved to https://phabricator.wikimedia.org/P68138 and previous config saved to /var/cache/conftool/dbconfig/20240829-111351-ladsgroup.json
11:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2050
11:13 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2050
11:13 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2050.codfw.wmnet with OS bullseye
11:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f8d70a81b80>
11:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2031
11:11 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2031
11:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2031.codfw.wmnet 179.0.192.10.in-addr.arpa 9.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
11:11 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2031.codfw.wmnet 179.0.192.10.in-addr.arpa 9.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
11:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2031 - cgoubert@cumin1002"
11:10 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2031 - cgoubert@cumin1002"
11:07 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
11:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T371742)', diff saved to https://phabricator.wikimedia.org/P68137 and previous config saved to /var/cache/conftool/dbconfig/20240829-110637-ladsgroup.json
11:06 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f8d70a81b80>
11:06 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2031.codfw.wmnet with OS bullseye
11:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2031.codfw.wmnet
11:02 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2031.codfw.wmnet
11:02 cgoubert@cumin1002: START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2031.codfw.wmnet
10:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P68136 and previous config saved to /var/cache/conftool/dbconfig/20240829-105844-ladsgroup.json
10:56 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp-test2005.wikimedia.org
10:56 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test2005.wikimedia.org with OS bookworm
10:49 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2050.codfw.wmnet with OS bullseye
10:49 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2050.codfw.wmnet with OS bullseye
10:48 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2050.codfw.wmnet on all recursors
10:48 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2050.codfw.wmnet on all recursors
10:48 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2050.codfw.wmnet with OS bullseye
10:48 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2050.codfw.wmnet with OS bullseye
10:47 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2380 to wikikube-worker2050
10:46 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2050
10:46 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2050
10:46 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:46 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2380 to wikikube-worker2050 - hnowlan@cumin1002"
10:44 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2380 to wikikube-worker2050 - hnowlan@cumin1002"
10:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P68134 and previous config saved to /var/cache/conftool/dbconfig/20240829-104336-ladsgroup.json
10:38 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
10:37 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2380 to wikikube-worker2050
10:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1183 (T371742)', diff saved to https://phabricator.wikimedia.org/P68133 and previous config saved to /var/cache/conftool/dbconfig/20240829-103724-ladsgroup.json
10:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1183.eqiad.wmnet with reason: Maintenance
10:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.renumber-node (exit_code=0) Renumbering for host wikikube-worker2010.codfw.wmnet
10:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2010.codfw.wmnet
10:37 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2010.codfw.wmnet
10:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1183.eqiad.wmnet with reason: Maintenance
10:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T371742)', diff saved to https://phabricator.wikimedia.org/P68132 and previous config saved to /var/cache/conftool/dbconfig/20240829-103702-ladsgroup.json
10:36 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@cb0bc4d]: Test Refine through Airflow (duration: 00m 10s)
10:36 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@cb0bc4d]: Test Refine through Airflow
10:34 claime: homer lsw1-b6-codfw* commit 'T372878'
10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2010.codfw.wmnet with OS bullseye
10:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
10:29 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
10:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T370903)', diff saved to https://phabricator.wikimedia.org/P68131 and previous config saved to /var/cache/conftool/dbconfig/20240829-102829-ladsgroup.json
10:23 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
10:23 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
10:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P68130 and previous config saved to /var/cache/conftool/dbconfig/20240829-102155-ladsgroup.json
10:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2010.codfw.wmnet with reason: host reimage
10:09 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2010.codfw.wmnet with reason: host reimage
10:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P68128 and previous config saved to /var/cache/conftool/dbconfig/20240829-100648-ladsgroup.json
10:05 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2048.codfw.wmnet on all recursors
10:05 akosiaris@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2048.codfw.wmnet on all recursors
10:02 claime: homer cr*codfw* commit 'T372878'
10:01 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2048.codfw.wmnet
10:01 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2048.codfw.wmnet
10:00 akosiaris: T372878 wikikube-worker2048.codfw.wmnet updated in netbox and homer running
09:58 topranks: apply qos classifers and scedulers to interfaces on ulsfo CRs T339850
09:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7fa8c9ceef40>
09:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2010
09:52 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2010
09:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2010.codfw.wmnet 198.16.192.10.in-addr.arpa 8.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
09:52 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2010.codfw.wmnet 198.16.192.10.in-addr.arpa 8.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
09:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2010 - cgoubert@cumin1002"
09:52 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2010 - cgoubert@cumin1002"
09:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T371742)', diff saved to https://phabricator.wikimedia.org/P68127 and previous config saved to /var/cache/conftool/dbconfig/20240829-095141-ladsgroup.json
09:48 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
09:48 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7fa8c9ceef40>
09:48 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2010.codfw.wmnet with OS bullseye
09:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2010.codfw.wmnet
09:46 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2010.codfw.wmnet
09:46 cgoubert@cumin1002: START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2010.codfw.wmnet
09:44 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:44 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
09:44 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:43 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
09:32 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s4 to test-s4
09:32 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s4 to test-s4
09:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T370903)', diff saved to https://phabricator.wikimedia.org/P68126 and previous config saved to /var/cache/conftool/dbconfig/20240829-092819-ladsgroup.json
09:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
09:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
09:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1167.eqiad.wmnet with reason: Maintenance
09:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1167.eqiad.wmnet with reason: Maintenance
09:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T371742)', diff saved to https://phabricator.wikimedia.org/P68125 and previous config saved to /var/cache/conftool/dbconfig/20240829-092547-ladsgroup.json
09:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
09:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
09:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
09:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
09:24 topranks: apply qos classifers and scedulers to interfaces on asw2-ulsfo T339850
09:24 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "idp-test2005 - ayounsi@cumin1002"
09:24 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "idp-test2005 - ayounsi@cumin1002"
09:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test2005.wikimedia.org with reason: host reimage
09:14 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2380.codfw.wmnet
09:13 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2380.codfw.wmnet
09:13 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test2005.wikimedia.org with reason: host reimage
09:06 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@cb0bc4d]: Test Refine through Airflow (duration: 00m 11s)
09:06 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@cb0bc4d]: Test Refine through Airflow
08:59 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS bookworm
08:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2005.wikimedia.org - ayounsi@cumin1002"
08:58 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2005.wikimedia.org - ayounsi@cumin1002"
08:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp-test2005.wikimedia.org on all recursors
08:58 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache idp-test2005.wikimedia.org on all recursors
08:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2005.wikimedia.org - ayounsi@cumin1002"
08:58 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2005.wikimedia.org - ayounsi@cumin1002"
08:51 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
08:51 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host idp-test2005.wikimedia.org
08:41 hashar@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.20 refs T366965
07:53 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1011.eqiad.wmnet
07:47 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host snapshot1011.eqiad.wmnet
07:46 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host snapshot1011.eqiad.wmnet
07:46 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host snapshot1011.eqiad.wmnet
07:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: Testing
07:39 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: Testing
07:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T371742)', diff saved to https://phabricator.wikimedia.org/P68124 and previous config saved to /var/cache/conftool/dbconfig/20240829-070017-ladsgroup.json
06:55 kcvelaga@deploy1003: Finished deploy [airflow-dags/analytics_product@cb0bc4d]: (no justification provided) (duration: 00m 03s)
06:55 kcvelaga@deploy1003: Started deploy [airflow-dags/analytics_product@cb0bc4d]: (no justification provided)
06:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P68123 and previous config saved to /var/cache/conftool/dbconfig/20240829-064508-ladsgroup.json
06:30 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@cb0bc4d]: Test Refine through Airflow (duration: 00m 10s)
06:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P68122 and previous config saved to /var/cache/conftool/dbconfig/20240829-063000-ladsgroup.json
06:29 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@cb0bc4d]: Test Refine through Airflow
06:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T371742)', diff saved to https://phabricator.wikimedia.org/P68121 and previous config saved to /var/cache/conftool/dbconfig/20240829-061453-ladsgroup.json
04:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T371742)', diff saved to https://phabricator.wikimedia.org/P68120 and previous config saved to /var/cache/conftool/dbconfig/20240829-041348-ladsgroup.json
04:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2219.codfw.wmnet with reason: Maintenance
04:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2219.codfw.wmnet with reason: Maintenance
04:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T371742)', diff saved to https://phabricator.wikimedia.org/P68119 and previous config saved to /var/cache/conftool/dbconfig/20240829-041326-ladsgroup.json
03:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P68118 and previous config saved to /var/cache/conftool/dbconfig/20240829-035817-ladsgroup.json
03:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P68117 and previous config saved to /var/cache/conftool/dbconfig/20240829-034310-ladsgroup.json
03:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T371742)', diff saved to https://phabricator.wikimedia.org/P68116 and previous config saved to /var/cache/conftool/dbconfig/20240829-032803-ladsgroup.json
01:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T371742)', diff saved to https://phabricator.wikimedia.org/P68115 and previous config saved to /var/cache/conftool/dbconfig/20240829-012759-ladsgroup.json
01:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2210.codfw.wmnet with reason: Maintenance
01:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2210.codfw.wmnet with reason: Maintenance
01:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T371742)', diff saved to https://phabricator.wikimedia.org/P68114 and previous config saved to /var/cache/conftool/dbconfig/20240829-012736-ladsgroup.json
01:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P68113 and previous config saved to /var/cache/conftool/dbconfig/20240829-011229-ladsgroup.json
00:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P68112 and previous config saved to /var/cache/conftool/dbconfig/20240829-005722-ladsgroup.json
00:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T371742)', diff saved to https://phabricator.wikimedia.org/P68111 and previous config saved to /var/cache/conftool/dbconfig/20240829-004215-ladsgroup.json
00:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T370903)', diff saved to https://phabricator.wikimedia.org/P68110 and previous config saved to /var/cache/conftool/dbconfig/20240829-001215-ladsgroup.json

2024-08-28

23:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P68109 and previous config saved to /var/cache/conftool/dbconfig/20240828-235708-ladsgroup.json
23:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P68108 and previous config saved to /var/cache/conftool/dbconfig/20240828-234201-ladsgroup.json
23:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T370903)', diff saved to https://phabricator.wikimedia.org/P68107 and previous config saved to /var/cache/conftool/dbconfig/20240828-232653-ladsgroup.json
23:10 eileen: config revision changed from cb9b3655 to af0aadef re-enable dedupe contacts from start
23:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T370903)', diff saved to https://phabricator.wikimedia.org/P68106 and previous config saved to /var/cache/conftool/dbconfig/20240828-230748-ladsgroup.json
23:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2205.codfw.wmnet with reason: Maintenance
23:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2205.codfw.wmnet with reason: Maintenance
23:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T370903)', diff saved to https://phabricator.wikimedia.org/P68105 and previous config saved to /var/cache/conftool/dbconfig/20240828-230726-ladsgroup.json
22:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P68104 and previous config saved to /var/cache/conftool/dbconfig/20240828-225218-ladsgroup.json
22:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P68103 and previous config saved to /var/cache/conftool/dbconfig/20240828-223711-ladsgroup.json
22:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T371742)', diff saved to https://phabricator.wikimedia.org/P68102 and previous config saved to /var/cache/conftool/dbconfig/20240828-223325-ladsgroup.json
22:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2206.codfw.wmnet with reason: Maintenance
22:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2206.codfw.wmnet with reason: Maintenance
22:23 swfrench-wmf: running homer 'cr*codfw*' commit 'T372878'
22:22 ryankemper: [WDQS] `ryankemper@wdqs1015:~$ sudo systemctl restart wdqs-blazegraph`
22:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T370903)', diff saved to https://phabricator.wikimedia.org/P68101 and previous config saved to /var/cache/conftool/dbconfig/20240828-222204-ladsgroup.json
22:17 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2049.codfw.wmnet
22:17 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2049.codfw.wmnet
22:14 swfrench-wmf: running homer 'lsw1-b3-codfw*' commit 'T372878'
22:11 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2049.codfw.wmnet with OS bullseye
22:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T370903)', diff saved to https://phabricator.wikimedia.org/P68100 and previous config saved to /var/cache/conftool/dbconfig/20240828-220318-ladsgroup.json
22:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2194.codfw.wmnet with reason: Maintenance
22:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2194.codfw.wmnet with reason: Maintenance
22:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T370903)', diff saved to https://phabricator.wikimedia.org/P68099 and previous config saved to /var/cache/conftool/dbconfig/20240828-220256-ladsgroup.json
21:50 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2049.codfw.wmnet with reason: host reimage
21:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P68098 and previous config saved to /var/cache/conftool/dbconfig/20240828-214749-ladsgroup.json
21:46 swfrench@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2049.codfw.wmnet with reason: host reimage
21:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
21:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
21:39 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
21:39 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
21:33 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
21:33 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
21:32 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
21:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P68097 and previous config saved to /var/cache/conftool/dbconfig/20240828-213242-ladsgroup.json
21:32 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
21:31 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
21:30 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
21:29 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2049
21:29 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2049
21:28 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2049
21:28 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2049.codfw.wmnet 59.16.192.10.in-addr.arpa 9.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
21:28 swfrench@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-worker2049.codfw.wmnet 59.16.192.10.in-addr.arpa 9.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
21:28 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:28 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2049 - swfrench@cumin2002"
21:28 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2049 - swfrench@cumin2002"
21:26 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
21:26 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
21:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
21:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
21:24 swfrench@cumin2002: START - Cookbook sre.dns.netbox
21:23 swfrench@cumin2002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2049
21:23 swfrench@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2049.codfw.wmnet with OS bullseye
21:22 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2049.codfw.wmnet on all recursors
21:22 swfrench@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-worker2049.codfw.wmnet on all recursors
21:21 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2029 to wikikube-worker2049
21:20 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2049
21:20 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2049
21:20 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:20 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2029 to wikikube-worker2049 - swfrench@cumin2002"
21:20 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2029 to wikikube-worker2049 - swfrench@cumin2002"
21:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T370903)', diff saved to https://phabricator.wikimedia.org/P68096 and previous config saved to /var/cache/conftool/dbconfig/20240828-211734-ladsgroup.json
21:16 swfrench@cumin2002: START - Cookbook sre.dns.netbox
21:15 swfrench@cumin2002: START - Cookbook sre.hosts.rename from kubernetes2029 to wikikube-worker2049
21:10 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2029.codfw.wmnet
21:10 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2029.codfw.wmnet
20:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T370903)', diff saved to https://phabricator.wikimedia.org/P68095 and previous config saved to /var/cache/conftool/dbconfig/20240828-205834-ladsgroup.json
20:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2190.codfw.wmnet with reason: Maintenance
20:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2190.codfw.wmnet with reason: Maintenance
20:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T370903)', diff saved to https://phabricator.wikimedia.org/P68094 and previous config saved to /var/cache/conftool/dbconfig/20240828-205812-ladsgroup.json
20:54 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
20:53 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
20:51 cjming: end of UTC late backport window
20:49 cjming@deploy1003: Finished scap sync-world: Backport for auth: Relax AuthManager session state check while cde00b55 is deployed (T373504), Fix missing definition of setSaveErrorMessage too (T373288), CentralAuthApiSessionProvider: Avoid error in internal API requests (T373507) (duration: 11m 31s)
20:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2199.codfw.wmnet with reason: Maintenance
20:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2199.codfw.wmnet with reason: Maintenance
20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T371742)', diff saved to https://phabricator.wikimedia.org/P68093 and previous config saved to /var/cache/conftool/dbconfig/20240828-204715-ladsgroup.json
20:44 cjming@deploy1003: matmarex, cjming: Continuing with sync
20:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P68092 and previous config saved to /var/cache/conftool/dbconfig/20240828-204305-ladsgroup.json
20:39 cjming@deploy1003: matmarex, cjming: Backport for auth: Relax AuthManager session state check while cde00b55 is deployed (T373504), Fix missing definition of setSaveErrorMessage too (T373288), CentralAuthApiSessionProvider: Avoid error in internal API requests (T373507) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:37 cjming@deploy1003: Started scap sync-world: Backport for auth: Relax AuthManager session state check while cde00b55 is deployed (T373504), Fix missing definition of setSaveErrorMessage too (T373288), CentralAuthApiSessionProvider: Avoid error in internal API requests (T373507)
20:35 cjming@deploy1003: Finished scap sync-world: Backport for Disable HLS VP9 video tracks in TimedMediaHandler (T373546) (duration: 08m 10s)
20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P68091 and previous config saved to /var/cache/conftool/dbconfig/20240828-203208-ladsgroup.json
20:31 cjming@deploy1003: bvibber, cjming: Continuing with sync
20:29 cjming@deploy1003: bvibber, cjming: Backport for Disable HLS VP9 video tracks in TimedMediaHandler (T373546) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P68090 and previous config saved to /var/cache/conftool/dbconfig/20240828-202757-ladsgroup.json
20:27 cjming@deploy1003: Started scap sync-world: Backport for Disable HLS VP9 video tracks in TimedMediaHandler (T373546)
20:26 cjming@deploy1003: Finished scap sync-world: Backport for logging: Use '??=' operator to reduce repetition (duration: 06m 39s)
20:21 cjming@deploy1003: cjming, matmarex: Continuing with sync
20:21 cjming@deploy1003: cjming, matmarex: Backport for logging: Use '??=' operator to reduce repetition synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:19 cjming@deploy1003: Started scap sync-world: Backport for logging: Use '??=' operator to reduce repetition
20:17 cjming@deploy1003: Finished scap sync-world: Backport for Lift IP cap on this dates 10/09, 17/09, 24/09 for edit-a-thon for eswiki, commons and wikidata (T373468) (duration: 11m 02s)
20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P68089 and previous config saved to /var/cache/conftool/dbconfig/20240828-201701-ladsgroup.json
20:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T370903)', diff saved to https://phabricator.wikimedia.org/P68088 and previous config saved to /var/cache/conftool/dbconfig/20240828-201250-ladsgroup.json
20:12 cjming@deploy1003: cjming, gergesshamon: Continuing with sync
20:09 cjming@deploy1003: cjming, gergesshamon: Backport for Lift IP cap on this dates 10/09, 17/09, 24/09 for edit-a-thon for eswiki, commons and wikidata (T373468) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:06 cjming@deploy1003: Started scap sync-world: Backport for Lift IP cap on this dates 10/09, 17/09, 24/09 for edit-a-thon for eswiki, commons and wikidata (T373468)
20:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T371742)', diff saved to https://phabricator.wikimedia.org/P68087 and previous config saved to /var/cache/conftool/dbconfig/20240828-200154-ladsgroup.json
19:59 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
19:58 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
19:54 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
19:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T370903)', diff saved to https://phabricator.wikimedia.org/P68086 and previous config saved to /var/cache/conftool/dbconfig/20240828-195401-ladsgroup.json
19:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2177.codfw.wmnet with reason: Maintenance
19:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2177.codfw.wmnet with reason: Maintenance
19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T370903)', diff saved to https://phabricator.wikimedia.org/P68085 and previous config saved to /var/cache/conftool/dbconfig/20240828-195339-ladsgroup.json
19:51 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P68084 and previous config saved to /var/cache/conftool/dbconfig/20240828-193832-ladsgroup.json
19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P68083 and previous config saved to /var/cache/conftool/dbconfig/20240828-192325-ladsgroup.json
19:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T370903)', diff saved to https://phabricator.wikimedia.org/P68082 and previous config saved to /var/cache/conftool/dbconfig/20240828-190817-ladsgroup.json
19:02 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
18:59 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2048.codfw.wmnet with OS bullseye
18:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T370903)', diff saved to https://phabricator.wikimedia.org/P68081 and previous config saved to /var/cache/conftool/dbconfig/20240828-184950-ladsgroup.json
18:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
18:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
18:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2156.codfw.wmnet with reason: Maintenance
18:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2156.codfw.wmnet with reason: Maintenance
18:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T370903)', diff saved to https://phabricator.wikimedia.org/P68080 and previous config saved to /var/cache/conftool/dbconfig/20240828-184923-ladsgroup.json
18:39 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2048.codfw.wmnet with reason: host reimage
18:36 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2048.codfw.wmnet with reason: host reimage
18:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P68079 and previous config saved to /var/cache/conftool/dbconfig/20240828-183416-ladsgroup.json
18:19 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2048
18:19 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2048
18:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P68078 and previous config saved to /var/cache/conftool/dbconfig/20240828-181908-ladsgroup.json
18:18 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2048
18:18 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2048.codfw.wmnet 164.0.192.10.in-addr.arpa 4.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
18:18 akosiaris@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2048.codfw.wmnet 164.0.192.10.in-addr.arpa 4.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
18:18 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:18 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2048 - akosiaris@cumin1002"
18:18 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2048 - akosiaris@cumin1002"
18:16 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2045.codfw.wmnet
18:16 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2045.codfw.wmnet
18:15 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
18:14 akosiaris@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker2048
18:14 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2048.codfw.wmnet with OS bullseye
18:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2294 to wikikube-worker2048
18:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2048
18:08 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2048
18:08 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:08 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2294 to wikikube-worker2048 - akosiaris@cumin1002"
18:08 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2294 to wikikube-worker2048 - akosiaris@cumin1002"
18:04 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
18:04 akosiaris@cumin1002: START - Cookbook sre.hosts.rename from mw2294 to wikikube-worker2048
18:04 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2045.codfw.wmnet with OS bullseye
18:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T370903)', diff saved to https://phabricator.wikimedia.org/P68077 and previous config saved to /var/cache/conftool/dbconfig/20240828-180401-ladsgroup.json
18:00 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1001.eqiad.wmnet with OS bookworm
17:57 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2294.codfw.wmnet
17:57 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2294.codfw.wmnet
17:48 ejegg: fundraising civicrm upgraded from e3aead7d to 916cad45
17:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2179 (T371742)', diff saved to https://phabricator.wikimedia.org/P68076 and previous config saved to /var/cache/conftool/dbconfig/20240828-174811-ladsgroup.json
17:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
17:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
17:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T371742)', diff saved to https://phabricator.wikimedia.org/P68075 and previous config saved to /var/cache/conftool/dbconfig/20240828-174749-ladsgroup.json
17:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T370903)', diff saved to https://phabricator.wikimedia.org/P68074 and previous config saved to /var/cache/conftool/dbconfig/20240828-174514-ladsgroup.json
17:45 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2045.codfw.wmnet with reason: host reimage
17:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2149.codfw.wmnet with reason: Maintenance
17:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2149.codfw.wmnet with reason: Maintenance
17:42 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
17:42 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2045.codfw.wmnet with reason: host reimage
17:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P68073 and previous config saved to /var/cache/conftool/dbconfig/20240828-173242-ladsgroup.json
17:30 kcvelaga@deploy1003: Finished deploy [airflow-dags/analytics_product@cb0bc4d]: (no justification provided) (duration: 00m 18s)
17:29 kcvelaga@deploy1003: Started deploy [airflow-dags/analytics_product@cb0bc4d]: (no justification provided)
17:27 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T370903)', diff saved to https://phabricator.wikimedia.org/P68072 and previous config saved to /var/cache/conftool/dbconfig/20240828-172653-ladsgroup.json
17:24 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2045.codfw.wmnet with OS bullseye
17:22 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:22 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:22 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P68071 and previous config saved to /var/cache/conftool/dbconfig/20240828-171735-ladsgroup.json
17:14 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P68070 and previous config saved to /var/cache/conftool/dbconfig/20240828-171146-ladsgroup.json
17:05 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:03 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:02 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T371742)', diff saved to https://phabricator.wikimedia.org/P68069 and previous config saved to /var/cache/conftool/dbconfig/20240828-170228-ladsgroup.json
17:02 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:01 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:00 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:59 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:59 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P68068 and previous config saved to /var/cache/conftool/dbconfig/20240828-165638-ladsgroup.json
16:51 topranks: add qos config to management firewalls T339850
16:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1001.eqiad.wmnet with OS bookworm
16:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
16:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T370903)', diff saved to https://phabricator.wikimedia.org/P68067 and previous config saved to /var/cache/conftool/dbconfig/20240828-164131-ladsgroup.json
16:38 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1001.eqiad.wmnet with OS bookworm
16:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2009.codfw.wmnet
16:35 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2009.codfw.wmnet
16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
16:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2009.codfw.wmnet
16:32 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2009.codfw.wmnet
16:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2009.codfw.wmnet
16:30 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2009.codfw.wmnet
16:26 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:26 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:24 hnowlan@deploy1003: Finished scap sync-world: Backport for timedmediahandler: revert using shellbox for commonswiki (T373517) (duration: 07m 13s)
16:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2127 (T370903)', diff saved to https://phabricator.wikimedia.org/P68066 and previous config saved to /var/cache/conftool/dbconfig/20240828-162239-ladsgroup.json
16:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2127.codfw.wmnet with reason: Maintenance
16:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2127.codfw.wmnet with reason: Maintenance
16:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1009.eqiad.wmnet with OS bookworm
16:20 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:20 hnowlan@deploy1003: hnowlan: Continuing with sync
16:20 hnowlan@deploy1003: hnowlan: Backport for timedmediahandler: revert using shellbox for commonswiki (T373517) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:17 hnowlan@deploy1003: Started scap sync-world: Backport for timedmediahandler: revert using shellbox for commonswiki (T373517)
16:17 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
16:07 cgoubert@cumin1002: END (FAIL) - Cookbook sre.k8s.renumber-node (exit_code=99) Renumbering for host wikikube-worker2009.codfw.wmnet
16:06 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
16:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1240.eqiad.wmnet with reason: Maintenance
16:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1240.eqiad.wmnet with reason: Maintenance
16:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T370903)', diff saved to https://phabricator.wikimedia.org/P68065 and previous config saved to /var/cache/conftool/dbconfig/20240828-160354-ladsgroup.json
16:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1009.eqiad.wmnet with reason: host reimage
16:02 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
16:01 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
16:01 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
16:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1001.eqiad.wmnet with OS bookworm
15:59 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1009.eqiad.wmnet with reason: host reimage
15:52 urandom: TRUNCATE-ing RESTBase tables (`{commons,enwiki,others,wikipedia}_T_mobileoZCBVtILw5eSrwi0VIGaFVSr2jY`) — T342148
15:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
15:49 claime: homer lsw1-b6-codfw* commit 'T372878'
15:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P68063 and previous config saved to /var/cache/conftool/dbconfig/20240828-154846-ladsgroup.json
15:47 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@0b23c91]: Test Refine through Airflow (duration: 00m 11s)
15:47 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1009.eqiad.wmnet with OS bookworm
15:47 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@0b23c91]: Test Refine through Airflow
15:45 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-lab1001.eqiad.wmnet with OS bookworm
15:40 claime: homer cr*codfw* commit 'T372878'
15:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2009.codfw.wmnet with OS bullseye
15:34 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:34 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P68062 and previous config saved to /var/cache/conftool/dbconfig/20240828-153338-ladsgroup.json
15:23 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/toolhub: sync
15:23 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/toolhub: sync
15:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1011.eqiad.wmnet with OS bookworm
15:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:22 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/toolhub: sync
15:22 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/toolhub: sync
15:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2009.codfw.wmnet with reason: host reimage
15:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T370903)', diff saved to https://phabricator.wikimedia.org/P68061 and previous config saved to /var/cache/conftool/dbconfig/20240828-151831-ladsgroup.json
15:18 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
15:17 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
15:16 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2009.codfw.wmnet with reason: host reimage
15:14 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
15:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1223 (T370903)', diff saved to https://phabricator.wikimedia.org/P68060 and previous config saved to /var/cache/conftool/dbconfig/20240828-151404-ladsgroup.json
15:14 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
15:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1223.eqiad.wmnet with reason: Maintenance
15:13 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
15:13 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
15:13 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1223.eqiad.wmnet with reason: Maintenance
15:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T370903)', diff saved to https://phabricator.wikimedia.org/P68059 and previous config saved to /var/cache/conftool/dbconfig/20240828-151342-ladsgroup.json
15:11 jclark@cumin1002: START - Cookbook sre.dns.netbox
15:11 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-lab1002
15:11 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-lab1002
15:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1011.eqiad.wmnet with reason: host reimage
15:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1011.eqiad.wmnet with reason: host reimage
14:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f9ac7a901f0>
14:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2009
14:59 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2009
14:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2009.codfw.wmnet 197.16.192.10.in-addr.arpa 7.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
14:59 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2009.codfw.wmnet 197.16.192.10.in-addr.arpa 7.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
14:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2009 - cgoubert@cumin1002"
14:59 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2009 - cgoubert@cumin1002"
14:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P68058 and previous config saved to /var/cache/conftool/dbconfig/20240828-145835-ladsgroup.json
14:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T371742)', diff saved to https://phabricator.wikimedia.org/P68057 and previous config saved to /var/cache/conftool/dbconfig/20240828-145651-ladsgroup.json
14:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
14:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
14:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T371742)', diff saved to https://phabricator.wikimedia.org/P68056 and previous config saved to /var/cache/conftool/dbconfig/20240828-145629-ladsgroup.json
14:55 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:55 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f9ac7a901f0>
14:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2009.codfw.wmnet with OS bullseye
14:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2009.codfw.wmnet
14:54 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2009.codfw.wmnet
14:54 cgoubert@cumin1002: START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2009.codfw.wmnet
14:50 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1011.eqiad.wmnet with OS bookworm
14:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P68054 and previous config saved to /var/cache/conftool/dbconfig/20240828-144328-ladsgroup.json
14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P68053 and previous config saved to /var/cache/conftool/dbconfig/20240828-144122-ladsgroup.json
14:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1010.eqiad.wmnet with OS bookworm
14:36 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:35 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T370903)', diff saved to https://phabricator.wikimedia.org/P68052 and previous config saved to /var/cache/conftool/dbconfig/20240828-142821-ladsgroup.json
14:26 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:26 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P68051 and previous config saved to /var/cache/conftool/dbconfig/20240828-142615-ladsgroup.json
14:25 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
14:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1001.eqiad.wmnet with OS bookworm
14:24 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
14:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T370903)', diff saved to https://phabricator.wikimedia.org/P68050 and previous config saved to /var/cache/conftool/dbconfig/20240828-142355-ladsgroup.json
14:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
14:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
14:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1212.eqiad.wmnet with reason: Maintenance
14:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1212.eqiad.wmnet with reason: Maintenance
14:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T370903)', diff saved to https://phabricator.wikimedia.org/P68049 and previous config saved to /var/cache/conftool/dbconfig/20240828-142315-ladsgroup.json
14:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1010.eqiad.wmnet with reason: host reimage
14:20 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
14:19 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
14:19 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
14:19 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
14:18 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:18 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1010.eqiad.wmnet with reason: host reimage
14:00 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
14:00 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
13:59 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
13:59 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
13:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1010.eqiad.wmnet with OS bookworm
13:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P68046 and previous config saved to /var/cache/conftool/dbconfig/20240828-135300-ladsgroup.json
13:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2009.codfw.wmnet
13:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2009.codfw.wmnet
13:45 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
13:40 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: sync
13:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from test-s1 to test-s1
13:38 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s1 to test-s1
13:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T370903)', diff saved to https://phabricator.wikimedia.org/P68045 and previous config saved to /var/cache/conftool/dbconfig/20240828-133753-ladsgroup.json
13:36 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the switch from test-s1 to test-s1
13:36 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s1 to test-s1
13:36 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
13:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T370903)', diff saved to https://phabricator.wikimedia.org/P68044 and previous config saved to /var/cache/conftool/dbconfig/20240828-133346-ladsgroup.json
13:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1198.eqiad.wmnet with reason: Maintenance
13:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1198.eqiad.wmnet with reason: Maintenance
13:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T370903)', diff saved to https://phabricator.wikimedia.org/P68043 and previous config saved to /var/cache/conftool/dbconfig/20240828-133323-ladsgroup.json
13:31 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the switch from test-s1 to test-s1
13:31 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s1 to test-s1
13:31 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: sync
13:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P68042 and previous config saved to /var/cache/conftool/dbconfig/20240828-131815-ladsgroup.json
13:10 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: sync
13:10 elukey@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: sync
13:04 topranks: rolling out config additions of qos schedulers and policers to all network devices T339850
13:03 godog: delete 2023 5m blocks from thanos - T351927
13:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P68041 and previous config saved to /var/cache/conftool/dbconfig/20240828-130308-ladsgroup.json
12:58 Dreamy_Jazz: Started MediaModeration scan on enwiki, time limited to 24hrs - https://wikitech.wikimedia.org/wiki/MediaModeration
12:57 sukhe: sudo ipmitool -I lanplus -H "puppetserver1002.mgmt.eqiad.wmnet" -U root -E chassis power cycle
12:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T370903)', diff saved to https://phabricator.wikimedia.org/P68040 and previous config saved to /var/cache/conftool/dbconfig/20240828-124801-ladsgroup.json
12:45 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
12:44 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
12:41 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
12:40 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
12:39 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
12:39 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
12:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from test-s1 to test-s1
12:29 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from test-s1 to test-s1
12:28 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.finalize (exit_code=99) for the switch from test-s1 to test-s1
12:28 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from test-s1 to test-s1
12:27 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.finalize (exit_code=99) for the switch from test-s1 to test-s1
12:27 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from test-s1 to test-s1
12:23 MichaelG_WMF: T371228 running foreachwikiindblist growthexperiments ./extensions/CommunityConfiguration/maintenance/setVersionData.php HelpPanel 1.0.0
12:22 arnaudb@cumin1002: END (ERROR) - Cookbook sre.switchdc.databases.finalize (exit_code=97) for the switch from test-s1 to test-s1
12:22 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from test-s1 to test-s1
12:19 MichaelG_WMF: T371228 running mwscript --wiki testwiki ./extensions/CommunityConfiguration/maintenance/setVersionData.php HelpPanel 1.0.0
11:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T371742)', diff saved to https://phabricator.wikimedia.org/P68039 and previous config saved to /var/cache/conftool/dbconfig/20240828-115123-ladsgroup.json
11:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
11:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
11:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
11:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
11:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T371742)', diff saved to https://phabricator.wikimedia.org/P68038 and previous config saved to /var/cache/conftool/dbconfig/20240828-115057-ladsgroup.json
11:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T370903)', diff saved to https://phabricator.wikimedia.org/P68037 and previous config saved to /var/cache/conftool/dbconfig/20240828-114745-ladsgroup.json
11:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
11:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
11:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T370903)', diff saved to https://phabricator.wikimedia.org/P68036 and previous config saved to /var/cache/conftool/dbconfig/20240828-114722-ladsgroup.json
11:44 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
11:43 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
11:43 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
11:42 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
11:42 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
11:42 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
11:41 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
11:40 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
11:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P68035 and previous config saved to /var/cache/conftool/dbconfig/20240828-113549-ladsgroup.json
11:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P68034 and previous config saved to /var/cache/conftool/dbconfig/20240828-113215-ladsgroup.json
11:23 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
11:22 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
11:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P68033 and previous config saved to /var/cache/conftool/dbconfig/20240828-112042-ladsgroup.json
11:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P68032 and previous config saved to /var/cache/conftool/dbconfig/20240828-111708-ladsgroup.json
11:17 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
11:14 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
11:12 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
11:10 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Maintain ranked order of candidates in STV vote summary (T373499) (duration: 06m 44s)
11:06 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
11:06 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
11:06 dreamyjazz@deploy1003: dreamyjazz: Backport for Maintain ranked order of candidates in STV vote summary (T373499) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T371742)', diff saved to https://phabricator.wikimedia.org/P68031 and previous config saved to /var/cache/conftool/dbconfig/20240828-110535-ladsgroup.json
11:03 dreamyjazz@deploy1003: Started scap sync-world: Backport for Maintain ranked order of candidates in STV vote summary (T373499)
11:02 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
11:02 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
11:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T370903)', diff saved to https://phabricator.wikimedia.org/P68030 and previous config saved to /var/cache/conftool/dbconfig/20240828-110200-ladsgroup.json
10:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T370903)', diff saved to https://phabricator.wikimedia.org/P68029 and previous config saved to /var/cache/conftool/dbconfig/20240828-105757-ladsgroup.json
10:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
10:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
10:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T370903)', diff saved to https://phabricator.wikimedia.org/P68028 and previous config saved to /var/cache/conftool/dbconfig/20240828-105735-ladsgroup.json
10:50 cmooney@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Relase v0.7.0 with updated plugin - cmooney@cumin1002
10:48 ladsgroup@deploy1003: Finished scap sync-world: Backport for Set ruwiki to non simple UI (T372694) (duration: 10m 48s)
10:44 ladsgroup@deploy1003: ladsgroup: Continuing with sync
10:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P68027 and previous config saved to /var/cache/conftool/dbconfig/20240828-104228-ladsgroup.json
10:42 ladsgroup@deploy1003: ladsgroup: Backport for Set ruwiki to non simple UI (T372694) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:41 godog: start prometheus2005 bookworm upgrade - T326657
10:40 cmooney@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Relase v0.7.0 with updated plugin - cmooney@cumin1002
10:38 ladsgroup@deploy1003: Started scap sync-world: Backport for Set ruwiki to non simple UI (T372694)
10:38 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
10:27 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
10:27 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
10:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P68026 and previous config saved to /var/cache/conftool/dbconfig/20240828-102721-ladsgroup.json
10:24 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
10:12 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the switch from test-s1 to test-s1
10:12 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s1 to test-s1
10:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T370903)', diff saved to https://phabricator.wikimedia.org/P68025 and previous config saved to /var/cache/conftool/dbconfig/20240828-101214-ladsgroup.json
10:11 arnaudb@cumin1002: END (ERROR) - Cookbook sre.switchdc.databases.prepare (exit_code=97) for the switch from test-s1 to test-s1
10:11 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s1 to test-s1
10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1157 (T370903)', diff saved to https://phabricator.wikimedia.org/P68024 and previous config saved to /var/cache/conftool/dbconfig/20240828-100803-ladsgroup.json
10:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1157.eqiad.wmnet with reason: Maintenance
10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1157.eqiad.wmnet with reason: Maintenance
10:07 arnaudb@cumin1002: END (ERROR) - Cookbook sre.switchdc.databases.prepare (exit_code=97) for the switch from test-s1 to test-s1
10:07 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s1 to test-s1
10:05 arnaudb@cumin1002: END (ERROR) - Cookbook sre.switchdc.databases.prepare (exit_code=97) for the switch from test-s1 to test-s1
10:05 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from test-s1 to test-s1
10:01 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
09:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
09:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
09:57 arnaudb@cumin1002: END (ERROR) - Cookbook sre.switchdc.databases.prepare (exit_code=97) for the (test) switch
09:57 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
09:57 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the (test) switch
09:54 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
09:49 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
09:49 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the (test) switch
09:48 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
09:40 godog: start prometheus1005 bookworm upgrade - T326657
09:36 claime: homer 'cr*codfw*' commit 'T372878'
09:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2043.codfw.wmnet
09:35 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2043.codfw.wmnet
09:35 claime: pooling wikikube-worker2043.codfw.wmnet - T372878
09:34 claime: homer 'lsw1-a3-codfw*' commit T372878
09:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the (test) switch
09:02 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
08:52 jayme: running homer commit on on cr*codfw* - T372878
08:50 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2047.codfw.wmnet
08:50 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2047.codfw.wmnet
08:48 jayme: running homer commit on on lsw1-a6-codfw* - T372878
08:46 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the (test) switch
08:45 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
08:45 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2047.codfw.wmnet with OS bullseye
08:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T371742)', diff saved to https://phabricator.wikimedia.org/P68023 and previous config saved to /var/cache/conftool/dbconfig/20240828-084045-ladsgroup.json
08:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
08:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
08:37 hashar@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.20 refs T366965
08:26 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2047.codfw.wmnet with reason: host reimage
08:22 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2047.codfw.wmnet with reason: host reimage
08:21 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the (test) switch
08:21 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
08:21 arnaudb@cumin1002: END (ERROR) - Cookbook sre.switchdc.databases.prepare (exit_code=97) for the (test) switch
08:05 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f4a5bda6340>
08:05 jayme@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2047
08:04 jayme@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2047
08:04 jayme@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2047.codfw.wmnet 196.0.192.10.in-addr.arpa 6.9.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
08:04 jayme@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2047.codfw.wmnet 196.0.192.10.in-addr.arpa 6.9.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
08:04 jayme@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:04 jayme@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2047 - jayme@cumin1002"
08:04 jayme@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2047 - jayme@cumin1002"
07:59 jayme@cumin1002: START - Cookbook sre.dns.netbox
07:58 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
07:58 arnaudb@cumin1002: END (ERROR) - Cookbook sre.switchdc.databases.prepare (exit_code=97) for the (test) switch
07:58 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
07:54 jayme@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f4a5bda6340>
07:54 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2047.codfw.wmnet with OS bullseye
07:54 jayme@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2047.codfw.wmnet on all recursors
07:54 jayme@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2047.codfw.wmnet on all recursors
07:53 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2007 to wikikube-worker2047
07:52 jayme@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2047
07:52 jayme@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2047
07:52 jayme@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:52 jayme@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2007 to wikikube-worker2047 - jayme@cumin1002"
07:51 jayme@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2007 to wikikube-worker2047 - jayme@cumin1002"
07:44 jayme@cumin1002: START - Cookbook sre.dns.netbox
07:44 jayme@cumin1002: START - Cookbook sre.hosts.rename from kubernetes2007 to wikikube-worker2047
07:31 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2007.codfw.wmnet
07:30 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2007.codfw.wmnet
06:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
06:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
06:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T371742)', diff saved to https://phabricator.wikimedia.org/P68022 and previous config saved to /var/cache/conftool/dbconfig/20240828-062759-ladsgroup.json
06:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P68021 and previous config saved to /var/cache/conftool/dbconfig/20240828-061252-ladsgroup.json
06:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
06:01 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
05:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
05:59 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
05:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P68020 and previous config saved to /var/cache/conftool/dbconfig/20240828-055744-ladsgroup.json
05:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T371742)', diff saved to https://phabricator.wikimedia.org/P68019 and previous config saved to /var/cache/conftool/dbconfig/20240828-054237-ladsgroup.json
03:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2137 (T371742)', diff saved to https://phabricator.wikimedia.org/P68018 and previous config saved to /var/cache/conftool/dbconfig/20240828-033211-ladsgroup.json
03:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
03:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
03:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T371742)', diff saved to https://phabricator.wikimedia.org/P68017 and previous config saved to /var/cache/conftool/dbconfig/20240828-033149-ladsgroup.json
03:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P68016 and previous config saved to /var/cache/conftool/dbconfig/20240828-031642-ladsgroup.json
03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P68015 and previous config saved to /var/cache/conftool/dbconfig/20240828-030135-ladsgroup.json
02:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T371742)', diff saved to https://phabricator.wikimedia.org/P68014 and previous config saved to /var/cache/conftool/dbconfig/20240828-024627-ladsgroup.json
02:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2136 (T371742)', diff saved to https://phabricator.wikimedia.org/P68013 and previous config saved to /var/cache/conftool/dbconfig/20240828-020145-ladsgroup.json
02:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
02:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T370903)', diff saved to https://phabricator.wikimedia.org/P68012 and previous config saved to /var/cache/conftool/dbconfig/20240828-013903-ladsgroup.json
01:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P68011 and previous config saved to /var/cache/conftool/dbconfig/20240828-012356-ladsgroup.json
01:21 ejegg: payments-wiki upgraded from f6a3be41 to 54988ad9
01:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P68010 and previous config saved to /var/cache/conftool/dbconfig/20240828-010849-ladsgroup.json
00:59 ejegg: payments-wiki upgraded from 0455b791 to f6a3be41
00:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T370903)', diff saved to https://phabricator.wikimedia.org/P68009 and previous config saved to /var/cache/conftool/dbconfig/20240828-005342-ladsgroup.json
00:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2218 (T370903)', diff saved to https://phabricator.wikimedia.org/P68008 and previous config saved to /var/cache/conftool/dbconfig/20240828-004702-ladsgroup.json
00:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2218.codfw.wmnet with reason: Maintenance
00:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2218.codfw.wmnet with reason: Maintenance
00:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T370903)', diff saved to https://phabricator.wikimedia.org/P68007 and previous config saved to /var/cache/conftool/dbconfig/20240828-004639-ladsgroup.json
00:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P68006 and previous config saved to /var/cache/conftool/dbconfig/20240828-003132-ladsgroup.json
00:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P68005 and previous config saved to /var/cache/conftool/dbconfig/20240828-001625-ladsgroup.json
00:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
00:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
00:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T371742)', diff saved to https://phabricator.wikimedia.org/P68004 and previous config saved to /var/cache/conftool/dbconfig/20240828-001214-ladsgroup.json
00:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T370903)', diff saved to https://phabricator.wikimedia.org/P68003 and previous config saved to /var/cache/conftool/dbconfig/20240828-000117-ladsgroup.json

2024-08-27

23:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P68002 and previous config saved to /var/cache/conftool/dbconfig/20240827-235707-ladsgroup.json
23:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T370903)', diff saved to https://phabricator.wikimedia.org/P68001 and previous config saved to /var/cache/conftool/dbconfig/20240827-235426-ladsgroup.json
23:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2208.codfw.wmnet with reason: Maintenance
23:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2208.codfw.wmnet with reason: Maintenance
23:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2200.codfw.wmnet with reason: Maintenance
23:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2200.codfw.wmnet with reason: Maintenance
23:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P68000 and previous config saved to /var/cache/conftool/dbconfig/20240827-234200-ladsgroup.json
23:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2198.codfw.wmnet with reason: Maintenance
23:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2198.codfw.wmnet with reason: Maintenance
23:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T370903)', diff saved to https://phabricator.wikimedia.org/P67999 and previous config saved to /var/cache/conftool/dbconfig/20240827-233854-ladsgroup.json
23:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T371742)', diff saved to https://phabricator.wikimedia.org/P67998 and previous config saved to /var/cache/conftool/dbconfig/20240827-232653-ladsgroup.json
23:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P67997 and previous config saved to /var/cache/conftool/dbconfig/20240827-232346-ladsgroup.json
23:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P67996 and previous config saved to /var/cache/conftool/dbconfig/20240827-230839-ladsgroup.json
22:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T370903)', diff saved to https://phabricator.wikimedia.org/P67995 and previous config saved to /var/cache/conftool/dbconfig/20240827-225332-ladsgroup.json
22:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T370903)', diff saved to https://phabricator.wikimedia.org/P67994 and previous config saved to /var/cache/conftool/dbconfig/20240827-224542-ladsgroup.json
22:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2182.codfw.wmnet with reason: Maintenance
22:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2182.codfw.wmnet with reason: Maintenance
22:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T370903)', diff saved to https://phabricator.wikimedia.org/P67993 and previous config saved to /var/cache/conftool/dbconfig/20240827-224520-ladsgroup.json
22:34 cstone: civicrm upgraded from f70d753c to e3aead7d
22:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P67992 and previous config saved to /var/cache/conftool/dbconfig/20240827-223013-ladsgroup.json
22:15 swfrench-wmf: running homer 'cr*codfw*' commit 'T372878'
22:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P67991 and previous config saved to /var/cache/conftool/dbconfig/20240827-221506-ladsgroup.json
22:07 swfrench-wmf: pooled / uncordoned wikikube-worker2046.codfw.wmnet - T372878
22:06 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2046.codfw.wmnet
22:06 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2046.codfw.wmnet
22:04 swfrench-wmf: Running homer 'lsw1-a8-codfw*' commit 'T372878'
22:01 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2046.codfw.wmnet with OS bullseye
22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T370903)', diff saved to https://phabricator.wikimedia.org/P67990 and previous config saved to /var/cache/conftool/dbconfig/20240827-215958-ladsgroup.json
21:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T370903)', diff saved to https://phabricator.wikimedia.org/P67989 and previous config saved to /var/cache/conftool/dbconfig/20240827-215230-ladsgroup.json
21:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
21:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
21:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T370903)', diff saved to https://phabricator.wikimedia.org/P67988 and previous config saved to /var/cache/conftool/dbconfig/20240827-215208-ladsgroup.json
21:41 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2046.codfw.wmnet with reason: host reimage
21:38 swfrench@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2046.codfw.wmnet with reason: host reimage
21:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P67987 and previous config saved to /var/cache/conftool/dbconfig/20240827-213700-ladsgroup.json
21:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P67986 and previous config saved to /var/cache/conftool/dbconfig/20240827-212153-ladsgroup.json
21:20 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f46e8b0b1c0>
21:20 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2046
21:20 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2046
21:20 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2046.codfw.wmnet 69.0.192.10.in-addr.arpa 9.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
21:20 swfrench@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-worker2046.codfw.wmnet 69.0.192.10.in-addr.arpa 9.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
21:20 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:20 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2046 - swfrench@cumin2002"
21:20 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2046 - swfrench@cumin2002"
21:15 swfrench@cumin2002: START - Cookbook sre.dns.netbox
21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T371742)', diff saved to https://phabricator.wikimedia.org/P67985 and previous config saved to /var/cache/conftool/dbconfig/20240827-211538-ladsgroup.json
21:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1249.eqiad.wmnet with reason: Maintenance
21:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1249.eqiad.wmnet with reason: Maintenance
21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T371742)', diff saved to https://phabricator.wikimedia.org/P67984 and previous config saved to /var/cache/conftool/dbconfig/20240827-211516-ladsgroup.json
21:15 swfrench@cumin2002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f46e8b0b1c0>
21:14 swfrench@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2046.codfw.wmnet with OS bullseye
21:13 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2046.codfw.wmnet on all recursors
21:13 swfrench@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-worker2046.codfw.wmnet on all recursors
21:12 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2026 to wikikube-worker2046
21:12 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2046
21:11 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2046
21:11 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:11 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2026 to wikikube-worker2046 - swfrench@cumin2002"
21:11 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2026 to wikikube-worker2046 - swfrench@cumin2002"
21:07 swfrench@cumin2002: START - Cookbook sre.dns.netbox
21:07 swfrench@cumin2002: START - Cookbook sre.hosts.rename from kubernetes2026 to wikikube-worker2046
21:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T370903)', diff saved to https://phabricator.wikimedia.org/P67983 and previous config saved to /var/cache/conftool/dbconfig/20240827-210646-ladsgroup.json
21:06 zabe@deploy1003: Finished scap sync-world: Backport for Activates the "compact" Parsoid indicator on all wikivoyage wikis (T372789), Rollback Parsoid+Kartographer rollout on hewiki and commons (T373454 T373460) (duration: 10m 55s)
21:01 zabe@deploy1003: ihurbain, zabe, cscott: Continuing with sync
21:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P67982 and previous config saved to /var/cache/conftool/dbconfig/20240827-210008-ladsgroup.json
20:59 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2026.codfw.wmnet
20:58 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2026.codfw.wmnet
20:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T370903)', diff saved to https://phabricator.wikimedia.org/P67981 and previous config saved to /var/cache/conftool/dbconfig/20240827-205855-ladsgroup.json
20:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
20:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
20:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2159.codfw.wmnet with reason: Maintenance
20:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2159.codfw.wmnet with reason: Maintenance
20:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T370903)', diff saved to https://phabricator.wikimedia.org/P67980 and previous config saved to /var/cache/conftool/dbconfig/20240827-205817-ladsgroup.json
20:57 zabe@deploy1003: ihurbain, zabe, cscott: Backport for Activates the "compact" Parsoid indicator on all wikivoyage wikis (T372789), Rollback Parsoid+Kartographer rollout on hewiki and commons (T373454 T373460) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:55 zabe@deploy1003: Started scap sync-world: Backport for Activates the "compact" Parsoid indicator on all wikivoyage wikis (T372789), Rollback Parsoid+Kartographer rollout on hewiki and commons (T373454 T373460)
20:53 zabe@deploy1003: Finished scap sync-world: Backport for Remove warning on non-existing category (T373454), Remove warning on non-existing category (T373454) (duration: 08m 11s)
20:53 mstyles@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
20:53 mstyles@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
20:52 mstyles@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
20:52 mstyles@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
20:52 mstyles@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
20:52 mstyles@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
20:52 mstyles@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
20:51 mstyles@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
20:51 mstyles@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
20:49 mstyles@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
20:49 zabe@deploy1003: cscott, zabe: Continuing with sync
20:48 zabe@deploy1003: cscott, zabe: Backport for Remove warning on non-existing category (T373454), Remove warning on non-existing category (T373454) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:45 zabe@deploy1003: Started scap sync-world: Backport for Remove warning on non-existing category (T373454), Remove warning on non-existing category (T373454)
20:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P67979 and previous config saved to /var/cache/conftool/dbconfig/20240827-204501-ladsgroup.json
20:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P67978 and previous config saved to /var/cache/conftool/dbconfig/20240827-204310-ladsgroup.json
20:38 zabe@deploy1003: Finished scap sync-world: Backport for Revert "Allow gadget/browser extension extensibility of empty search state" (T373463), Tweak styling of compact Parsoid indicator (T372789) (duration: 13m 23s)
20:33 zabe@deploy1003: cscott, zabe, jdlrobson: Continuing with sync
20:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T371742)', diff saved to https://phabricator.wikimedia.org/P67977 and previous config saved to /var/cache/conftool/dbconfig/20240827-202954-ladsgroup.json
20:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P67976 and previous config saved to /var/cache/conftool/dbconfig/20240827-202803-ladsgroup.json
20:27 zabe@deploy1003: cscott, zabe, jdlrobson: Backport for Revert "Allow gadget/browser extension extensibility of empty search state" (T373463), Tweak styling of compact Parsoid indicator (T372789) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:27 bking@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2043.codfw.wmnet
20:27 bking@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2043.codfw.wmnet
20:24 zabe@deploy1003: Started scap sync-world: Backport for Revert "Allow gadget/browser extension extensibility of empty search state" (T373463), Tweak styling of compact Parsoid indicator (T372789)
20:22 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:22 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:22 zabe@deploy1003: Finished scap sync-world: Backport for Disable mobile Watchlist on wikidata since its broken (T263633) (duration: 09m 39s)
20:17 zabe@deploy1003: jdlrobson, zabe: Continuing with sync
20:15 zabe@deploy1003: jdlrobson, zabe: Backport for Disable mobile Watchlist on wikidata since its broken (T263633) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T370903)', diff saved to https://phabricator.wikimedia.org/P67975 and previous config saved to /var/cache/conftool/dbconfig/20240827-201256-ladsgroup.json
20:12 zabe@deploy1003: Started scap sync-world: Backport for Disable mobile Watchlist on wikidata since its broken (T263633)
20:12 zabe@deploy1003: Finished scap sync-world: Backport for Turn account vanishing contact form into a redirect. (T372828), Revert "[svwikt] Add a temporary logo for the 100.000 pages" (T364247) (duration: 11m 28s)
20:05 zabe@deploy1003: dbrant, zabe, pppery: Continuing with sync
20:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T370903)', diff saved to https://phabricator.wikimedia.org/P67974 and previous config saved to /var/cache/conftool/dbconfig/20240827-200459-ladsgroup.json
20:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2150.codfw.wmnet with reason: Maintenance
20:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2150.codfw.wmnet with reason: Maintenance
20:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T370903)', diff saved to https://phabricator.wikimedia.org/P67973 and previous config saved to /var/cache/conftool/dbconfig/20240827-200437-ladsgroup.json
20:04 zabe@deploy1003: dbrant, zabe, pppery: Backport for Turn account vanishing contact form into a redirect. (T372828), Revert "[svwikt] Add a temporary logo for the 100.000 pages" (T364247) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:01 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:01 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
20:00 zabe@deploy1003: Started scap sync-world: Backport for Turn account vanishing contact form into a redirect. (T372828), Revert "[svwikt] Add a temporary logo for the 100.000 pages" (T364247)
19:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P67972 and previous config saved to /var/cache/conftool/dbconfig/20240827-194930-ladsgroup.json
19:44 zabe@deploy1003: Finished scap sync-world: Backport for Update uzwiki logo (T370165) (duration: 17m 07s)
19:38 zabe@deploy1003: zabe: Continuing with sync
19:37 zabe@deploy1003: zabe: Backport for Update uzwiki logo (T370165) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P67971 and previous config saved to /var/cache/conftool/dbconfig/20240827-193424-ladsgroup.json
19:27 zabe@deploy1003: Started scap sync-world: Backport for Update uzwiki logo (T370165)
19:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T370903)', diff saved to https://phabricator.wikimedia.org/P67970 and previous config saved to /var/cache/conftool/dbconfig/20240827-191915-ladsgroup.json
19:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T370903)', diff saved to https://phabricator.wikimedia.org/P67969 and previous config saved to /var/cache/conftool/dbconfig/20240827-191116-ladsgroup.json
19:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2122.codfw.wmnet with reason: Maintenance
19:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2122.codfw.wmnet with reason: Maintenance
19:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T370903)', diff saved to https://phabricator.wikimedia.org/P67968 and previous config saved to /var/cache/conftool/dbconfig/20240827-191053-ladsgroup.json
19:01 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-lab1001
19:01 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-lab1001
19:01 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:01 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt ml-lab servers - jclark@cumin1002"
19:01 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt ml-lab servers - jclark@cumin1002"
18:58 jclark@cumin1002: START - Cookbook sre.dns.netbox
18:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P67967 and previous config saved to /var/cache/conftool/dbconfig/20240827-185546-ladsgroup.json
18:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P67966 and previous config saved to /var/cache/conftool/dbconfig/20240827-184039-ladsgroup.json
18:38 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:38 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
18:33 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:33 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
18:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T370903)', diff saved to https://phabricator.wikimedia.org/P67965 and previous config saved to /var/cache/conftool/dbconfig/20240827-182531-ladsgroup.json
18:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2121 (T370903)', diff saved to https://phabricator.wikimedia.org/P67964 and previous config saved to /var/cache/conftool/dbconfig/20240827-181732-ladsgroup.json
18:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
18:17 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
18:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
18:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
18:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T370903)', diff saved to https://phabricator.wikimedia.org/P67963 and previous config saved to /var/cache/conftool/dbconfig/20240827-181653-ladsgroup.json
18:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T371742)', diff saved to https://phabricator.wikimedia.org/P67962 and previous config saved to /var/cache/conftool/dbconfig/20240827-181020-ladsgroup.json
18:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1248.eqiad.wmnet with reason: Maintenance
18:10 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-lab1001
18:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1248.eqiad.wmnet with reason: Maintenance
18:10 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-lab1001
18:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T371742)', diff saved to https://phabricator.wikimedia.org/P67961 and previous config saved to /var/cache/conftool/dbconfig/20240827-180958-ladsgroup.json
18:06 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1009
18:05 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1009
18:05 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-lab1002
18:05 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-lab1002
18:05 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ml-lab1001
18:05 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-lab1001
18:05 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve1011
18:04 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve1011
18:04 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve1010
18:04 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve1010
18:03 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve1009
18:02 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve1009
18:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P67960 and previous config saved to /var/cache/conftool/dbconfig/20240827-180146-ladsgroup.json
17:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P67959 and previous config saved to /var/cache/conftool/dbconfig/20240827-175451-ladsgroup.json
17:54 ryankemper: T364368 Our LVS operation is done; I've enabled/ran puppet on the remaining lvs hosts
17:50 ryankemper: T364368 Ran puppet on `A:lvs-low-traffic-codfw`, restarted `pybal.service`, and cleared away old ipvs entries for `10.2.1.33:80` and `10.2.1.36:80`
17:47 ryankemper: T364368 Ran puppet on `A:lvs-secondary-codfw`, restarted `pybal.service`, and cleared away old ipvs entries for `10.2.1.33:80` and `10.2.1.36:80`
17:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P67957 and previous config saved to /var/cache/conftool/dbconfig/20240827-174639-ladsgroup.json
17:43 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:43 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt ml-lab servers - jclark@cumin1002"
17:43 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt ml-lab servers - jclark@cumin1002"
17:42 ryankemper: Typo, meant to say forced recheck on `lvs1019` to clear alert
17:41 ryankemper: Forced recheck on lvs2019 to clear alert
17:40 ryankemper: T364368 Cleared away old ipvs entries for `10.2.2.33:80` and `10.2.2.36:80`
17:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P67956 and previous config saved to /var/cache/conftool/dbconfig/20240827-173944-ladsgroup.json
17:38 jclark@cumin1002: START - Cookbook sre.dns.netbox
17:37 ryankemper: T364368 Ran puppet on `A:lvs-low-traffic-eqiad` and restarted `pybal.service`
17:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T370903)', diff saved to https://phabricator.wikimedia.org/P67954 and previous config saved to /var/cache/conftool/dbconfig/20240827-173132-ladsgroup.json
17:30 sukhe: force recheck on Icinga for lvs1020
17:30 sukhe: sukhe@lvs1020:~$ sudo ipvsadm --delete-service --tcp-service 10.2.2.33:80
17:29 sukhe: sukhe@lvs1020:~$ sudo ipvsadm ---delete-service --tcp-service 10.2.2.36:80
17:24 ryankemper: T364368 `ryankemper@cumin2002:~$ sudo cumin 'A:lvs-secondary-eqiad' 'systemctl status pybal.service'`
17:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T371742)', diff saved to https://phabricator.wikimedia.org/P67953 and previous config saved to /var/cache/conftool/dbconfig/20240827-172436-ladsgroup.json
17:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T370903)', diff saved to https://phabricator.wikimedia.org/P67952 and previous config saved to /var/cache/conftool/dbconfig/20240827-172401-ladsgroup.json
17:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1227.eqiad.wmnet with reason: Maintenance
17:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1227.eqiad.wmnet with reason: Maintenance
17:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T370903)', diff saved to https://phabricator.wikimedia.org/P67951 and previous config saved to /var/cache/conftool/dbconfig/20240827-172339-ladsgroup.json
17:13 ryankemper: T364368 Ran puppet on `A:lvs-secondary-eqiad` and restarted pybal.service
17:08 akosiaris@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2045.codfw.wmnet with OS bullseye
17:08 ryankemper: T364368 Disabled puppet on all lvs hosts in preparation for rolling restart
17:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P67950 and previous config saved to /var/cache/conftool/dbconfig/20240827-170832-ladsgroup.json
16:56 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus5002.eqsin.wmnet
16:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P67949 and previous config saved to /var/cache/conftool/dbconfig/20240827-165325-ladsgroup.json
16:50 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus5002.eqsin.wmnet
16:45 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:45 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt ml-lab servers - jclark@cumin1002"
16:45 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt ml-lab servers - jclark@cumin1002"
16:42 jclark@cumin1002: START - Cookbook sre.dns.netbox
16:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T370903)', diff saved to https://phabricator.wikimedia.org/P67948 and previous config saved to /var/cache/conftool/dbconfig/20240827-163817-ladsgroup.json
16:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T367856)', diff saved to https://phabricator.wikimedia.org/P67947 and previous config saved to /var/cache/conftool/dbconfig/20240827-163407-marostegui.json
16:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 7:00:00 on db2166.codfw.wmnet with reason: Maintenance
16:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 7:00:00 on db2166.codfw.wmnet with reason: Maintenance
16:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T367856)', diff saved to https://phabricator.wikimedia.org/P67946 and previous config saved to /var/cache/conftool/dbconfig/20240827-163345-marostegui.json
16:25 denisse: Start prometheus5002 Bookworm upgrade - T326657
16:21 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus4002.ulsfo.wmnet
16:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P67945 and previous config saved to /var/cache/conftool/dbconfig/20240827-161837-marostegui.json
16:17 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus4002.ulsfo.wmnet
16:13 kamila@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2044.codfw.wmnet
16:13 kamila@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2044.codfw.wmnet
16:12 kamila_: ran homer to add wikikube-worker2044 T372878
16:05 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2044.codfw.wmnet with OS bullseye
16:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T370903)', diff saved to https://phabricator.wikimedia.org/P67944 and previous config saved to /var/cache/conftool/dbconfig/20240827-160403-ladsgroup.json
16:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1202.eqiad.wmnet with reason: Maintenance
16:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1202.eqiad.wmnet with reason: Maintenance
16:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T370903)', diff saved to https://phabricator.wikimedia.org/P67943 and previous config saved to /var/cache/conftool/dbconfig/20240827-160341-ladsgroup.json
16:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P67942 and previous config saved to /var/cache/conftool/dbconfig/20240827-160330-marostegui.json
16:03 tappof@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::data and logs*2037.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
15:59 tappof@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on P{O:logging::opensearch::data and logs*2037.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
15:58 tappof@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::data and logs*2036.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
15:57 tappof@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on P{O:logging::opensearch::data and logs*2036.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
15:57 tappof@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::data and logs*2035.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
15:54 denisse: Start prometheus4002 Bookworm upgrade - T326657
15:52 tappof@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on P{O:logging::opensearch::data and logs*2035.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
15:52 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f7528213c70>
15:52 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2045
15:51 tappof@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::data and logs*2034.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
15:50 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2045
15:50 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2045.codfw.wmnet 163.0.192.10.in-addr.arpa 3.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:50 akosiaris@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2045.codfw.wmnet 163.0.192.10.in-addr.arpa 3.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:50 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:50 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2045 - akosiaris@cumin1002"
15:50 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2045 - akosiaris@cumin1002"
15:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P67941 and previous config saved to /var/cache/conftool/dbconfig/20240827-154834-ladsgroup.json
15:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T367856)', diff saved to https://phabricator.wikimedia.org/P67940 and previous config saved to /var/cache/conftool/dbconfig/20240827-154823-marostegui.json
15:46 tappof@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on P{O:logging::opensearch::data and logs*2034.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
15:45 tappof@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::data and logs*2033.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
15:45 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2044.codfw.wmnet with reason: host reimage
15:44 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
15:43 akosiaris@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f7528213c70>
15:43 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2045.codfw.wmnet with OS bullseye
15:43 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2293 to wikikube-worker2045
15:42 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2044.codfw.wmnet with reason: host reimage
15:42 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2045
15:42 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2045
15:42 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:42 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2293 to wikikube-worker2045 - akosiaris@cumin1002"
15:39 tappof@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on P{O:logging::opensearch::data and logs*2033.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
15:39 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2293 to wikikube-worker2045 - akosiaris@cumin1002"
15:39 tappof@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::data and logs*2029.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
15:36 tappof@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on P{O:logging::opensearch::data and logs*2029.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
15:35 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
15:35 akosiaris@cumin1002: START - Cookbook sre.hosts.rename from mw2293 to wikikube-worker2045
15:35 tappof@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::data and logs*2028.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
15:33 tappof@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on P{O:logging::opensearch::data and logs*2028.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
15:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P67939 and previous config saved to /var/cache/conftool/dbconfig/20240827-153327-ladsgroup.json
15:31 tappof@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::data and logs*2027.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
15:29 tappof@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on P{O:logging::opensearch::data and logs*2027.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
15:27 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f958d5462b0>
15:27 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2044
15:26 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2044
15:26 kamila@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2044.codfw.wmnet 207.0.192.10.in-addr.arpa 7.0.2.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:26 kamila@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2044.codfw.wmnet 207.0.192.10.in-addr.arpa 7.0.2.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:26 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:26 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2044 - kamila@cumin1002"
15:26 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2044 - kamila@cumin1002"
15:25 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:23 kamila@cumin1002: START - Cookbook sre.dns.netbox
15:22 kamila@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f958d5462b0>
15:22 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2044.codfw.wmnet with OS bullseye
15:20 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 100%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67937 and previous config saved to /var/cache/conftool/dbconfig/20240827-152031-arnaudb.json
15:20 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2019 to wikikube-worker2044
15:19 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2044
15:19 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:19 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2044
15:19 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:19 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2019 to wikikube-worker2044 - kamila@cumin1002"
15:19 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2019 to wikikube-worker2044 - kamila@cumin1002"
15:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T370903)', diff saved to https://phabricator.wikimedia.org/P67936 and previous config saved to /var/cache/conftool/dbconfig/20240827-151819-ladsgroup.json
15:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T370903)', diff saved to https://phabricator.wikimedia.org/P67935 and previous config saved to /var/cache/conftool/dbconfig/20240827-151610-ladsgroup.json
15:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1194.eqiad.wmnet with reason: Maintenance
15:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1194.eqiad.wmnet with reason: Maintenance
15:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T370903)', diff saved to https://phabricator.wikimedia.org/P67934 and previous config saved to /var/cache/conftool/dbconfig/20240827-151548-ladsgroup.json
15:15 kamila@cumin1002: START - Cookbook sre.dns.netbox
15:15 kamila@cumin1002: START - Cookbook sre.hosts.rename from kubernetes2019 to wikikube-worker2044
15:11 elukey: restart httpd and librenms-syslog.service on netmon1003 for libaom upgrades
15:11 elukey: restart httpd on crm2001 for libaom upgrades
15:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T371742)', diff saved to https://phabricator.wikimedia.org/P67933 and previous config saved to /var/cache/conftool/dbconfig/20240827-150952-ladsgroup.json
15:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1247.eqiad.wmnet with reason: Maintenance
15:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1247.eqiad.wmnet with reason: Maintenance
15:05 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 75%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67932 and previous config saved to /var/cache/conftool/dbconfig/20240827-150525-arnaudb.json
15:02 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2003.codfw.wmnet
15:01 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy GRACEFUL
15:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P67931 and previous config saved to /var/cache/conftool/dbconfig/20240827-150041-ladsgroup.json
14:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2232.codfw.wmnet with OS bookworm
14:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2231.codfw.wmnet with OS bookworm
14:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2230.codfw.wmnet with OS bookworm
14:50 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 50%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67930 and previous config saved to /var/cache/conftool/dbconfig/20240827-145020-arnaudb.json
14:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P67929 and previous config saved to /var/cache/conftool/dbconfig/20240827-144534-ladsgroup.json
14:44 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:42 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2293.codfw.wmnet
14:41 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2293.codfw.wmnet
14:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on wikikube-ctrl2003.codfw.wmnet with reason: running provision again
14:41 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on wikikube-ctrl2003.codfw.wmnet with reason: running provision again
14:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2232.codfw.wmnet with reason: host reimage
14:40 elukey@puppetserver1001: conftool action : set/pooled=no; selector: name=wikikube-ctrl2003.codfw.wmnet
14:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2231.codfw.wmnet with reason: host reimage
14:35 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 25%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67928 and previous config saved to /var/cache/conftool/dbconfig/20240827-143514-arnaudb.json
14:35 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2230.codfw.wmnet with reason: host reimage
14:32 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2231.codfw.wmnet with reason: host reimage
14:32 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2232.codfw.wmnet with reason: host reimage
14:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2230.codfw.wmnet with reason: host reimage
14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T370903)', diff saved to https://phabricator.wikimedia.org/P67927 and previous config saved to /var/cache/conftool/dbconfig/20240827-143027-ladsgroup.json
14:29 brouberol@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:29 brouberol@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding AAAA field to wdqs101[1-3] and wdqs200[7-8] - brouberol@cumin1002"
14:29 brouberol@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding AAAA field to wdqs101[1-3] and wdqs200[7-8] - brouberol@cumin1002"
14:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on db2186.codfw.wmnet with reason: Schema change
14:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on db2186.codfw.wmnet with reason: Schema change
14:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2186.codfw.wmnet with reason: Schema change
14:26 brouberol@cumin1002: START - Cookbook sre.dns.netbox
14:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2186.codfw.wmnet with reason: Schema change
14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T370903)', diff saved to https://phabricator.wikimedia.org/P67926 and previous config saved to /var/cache/conftool/dbconfig/20240827-142516-ladsgroup.json
14:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1191.eqiad.wmnet with reason: Maintenance
14:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1191.eqiad.wmnet with reason: Maintenance
14:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T370903)', diff saved to https://phabricator.wikimedia.org/P67925 and previous config saved to /var/cache/conftool/dbconfig/20240827-142454-ladsgroup.json
14:24 marostegui: Update zarcillo db for pc4 master T373340
14:20 akosiaris: T372878 uncordon wikikube-worker2043
14:20 akosiaris: T327878 uncordon wikikube-worker2043
14:20 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 15%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67924 and previous config saved to /var/cache/conftool/dbconfig/20240827-142009-arnaudb.json
14:18 tappof@cumin2002: END (FAIL) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=99) rolling restart_daemons on P{O:logging::opensearch::data and logs*.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
14:18 marostegui@cumin1002: dbctl commit (dc=all): 'Switch pc4 master to pc2015 T373340', diff saved to https://phabricator.wikimedia.org/P67923 and previous config saved to /var/cache/conftool/dbconfig/20240827-141845-marostegui.json
14:18 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2232.codfw.wmnet with OS bookworm
14:18 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2231.codfw.wmnet with OS bookworm
14:17 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2230.codfw.wmnet with OS bookworm
14:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2015-2016].codfw.wmnet,pc[1015-1016].eqiad.wmnet with reason: Switchover
14:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2015-2016].codfw.wmnet,pc[1015-1016].eqiad.wmnet with reason: Switchover
13:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P67920 and previous config saved to /var/cache/conftool/dbconfig/20240827-135440-ladsgroup.json
13:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 3%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67919 and previous config saved to /var/cache/conftool/dbconfig/20240827-134958-arnaudb.json
13:48 XioNoX: add bgpalerter to bookworm-wikipedia apt repo - T372909
13:47 XioNoX: add routinator to bookworm-wikipedia apt repo - T372909
13:46 zabe: zabe@mwmaint1002:~$ foreachwikiindblist private wrapOldPasswords.php --type BEP --update # T91917
13:46 zabe: zabe@mwmaint1002:~$ foreachwikiindblist fishbowl wrapOldPasswords.php --type BEP --update # T91917
13:45 zabe: zabe@mwmaint1002:~$ foreachwikiindblist private sql.php --query "UPDATE user SET user_password = CONCAT(':B:', user_id, ':', user_password) WHERE user_password RLIKE '^[0-9a-f]{32}$';" # T91917
13:44 zabe: zabe@mwmaint1002:~$ foreachwikiindblist fishbowl sql.php --query "UPDATE user SET user_password = CONCAT(':B:', user_id, ':', user_password) WHERE user_password RLIKE '^[0-9a-f]{32}$';" # T91917
13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T370903)', diff saved to https://phabricator.wikimedia.org/P67918 and previous config saved to /var/cache/conftool/dbconfig/20240827-133933-ladsgroup.json
13:37 zabe@deploy1003: Finished scap sync-world: Backport for Register feature flag for moving wikibase item to Other Projects sidebar in pilot wikis., Enable CampaignEvents Invitation Lists in production testing environments (T373041) (duration: 31m 27s)
13:37 tappof@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::collector and log*.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
13:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T370903)', diff saved to https://phabricator.wikimedia.org/P67917 and previous config saved to /var/cache/conftool/dbconfig/20240827-133723-ladsgroup.json
13:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
13:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
13:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T370903)', diff saved to https://phabricator.wikimedia.org/P67915 and previous config saved to /var/cache/conftool/dbconfig/20240827-133701-ladsgroup.json
13:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 2%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67914 and previous config saved to /var/cache/conftool/dbconfig/20240827-133452-arnaudb.json
13:33 zabe@deploy1003: joelyrookewmde, daimona, zabe: Continuing with sync
13:29 Daimona: Creating new DB tables for the CampaignEvents extension in x1.testwiki, x1.test2wiki, x1.officewiki, and x1.wikishared # T369303
13:23 tappof@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on P{O:logging::opensearch::collector and log*.codfw.wmnet} and (A:datahubsearch or A:logstash-eqiad or A:logstash-codfw)
13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P67913 and previous config saved to /var/cache/conftool/dbconfig/20240827-132154-ladsgroup.json
13:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 1%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67912 and previous config saved to /var/cache/conftool/dbconfig/20240827-131947-arnaudb.json
13:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
13:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T371742)', diff saved to https://phabricator.wikimedia.org/P67911 and previous config saved to /var/cache/conftool/dbconfig/20240827-131031-ladsgroup.json
13:08 zabe@deploy1003: joelyrookewmde, daimona, zabe: Backport for Register feature flag for moving wikibase item to Other Projects sidebar in pilot wikis., Enable CampaignEvents Invitation Lists in production testing environments (T373041) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P67910 and previous config saved to /var/cache/conftool/dbconfig/20240827-130647-ladsgroup.json
13:06 zabe@deploy1003: Started scap sync-world: Backport for Register feature flag for moving wikibase item to Other Projects sidebar in pilot wikis., Enable CampaignEvents Invitation Lists in production testing environments (T373041)
12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P67909 and previous config saved to /var/cache/conftool/dbconfig/20240827-125523-ladsgroup.json
12:55 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2043.codfw.wmnet with OS bullseye
12:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T370903)', diff saved to https://phabricator.wikimedia.org/P67908 and previous config saved to /var/cache/conftool/dbconfig/20240827-125139-ladsgroup.json
12:49 zabe: zabe@mwmaint1002:~$ foreachwikiindblist fishbowl wrapOldPasswords.php --type BEP --update # T91917
12:46 zabe: zabe@mwmaint1002:~$ foreachwikiindblist private wrapOldPasswords.php --type BEP --update # T91917
12:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T370903)', diff saved to https://phabricator.wikimedia.org/P67907 and previous config saved to /var/cache/conftool/dbconfig/20240827-124629-ladsgroup.json
12:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
12:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
12:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P67906 and previous config saved to /var/cache/conftool/dbconfig/20240827-124016-ladsgroup.json
12:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
12:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
12:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T370903)', diff saved to https://phabricator.wikimedia.org/P67905 and previous config saved to /var/cache/conftool/dbconfig/20240827-123839-ladsgroup.json
12:38 zabe@deploy1003: Finished scap sync-world: Backport for Revert apparent fix (T368712) (duration: 08m 20s)
12:35 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2043.codfw.wmnet with reason: host reimage
12:34 zabe@deploy1003: zabe: Continuing with sync
12:33 zabe@deploy1003: zabe: Backport for Revert apparent fix (T368712) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:32 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2043.codfw.wmnet with reason: host reimage
12:30 zabe@deploy1003: Started scap sync-world: Backport for Revert apparent fix (T368712)
12:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67904 and previous config saved to /var/cache/conftool/dbconfig/20240827-122910-arnaudb.json
12:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T371742)', diff saved to https://phabricator.wikimedia.org/P67903 and previous config saved to /var/cache/conftool/dbconfig/20240827-122509-ladsgroup.json
12:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P67902 and previous config saved to /var/cache/conftool/dbconfig/20240827-122332-ladsgroup.json
12:18 hashar@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.20 refs T366965
12:15 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7fa3b11fc520>
12:15 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2043
12:15 kamila@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2019.codfw.wmnet
12:14 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2043
12:14 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2043.codfw.wmnet 162.0.192.10.in-addr.arpa 2.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
12:14 akosiaris@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2043.codfw.wmnet 162.0.192.10.in-addr.arpa 2.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
12:14 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:14 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2043 - akosiaris@cumin1002"
12:14 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2043 - akosiaris@cumin1002"
12:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 75%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67901 and previous config saved to /var/cache/conftool/dbconfig/20240827-121405-arnaudb.json
12:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67900 and previous config saved to /var/cache/conftool/dbconfig/20240827-121216-arnaudb.json
12:11 kamila@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2019.codfw.wmnet
12:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2015.codfw.wmnet with reason: Network maintenance
12:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc2015.codfw.wmnet with reason: Network maintenance
12:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P67899 and previous config saved to /var/cache/conftool/dbconfig/20240827-120825-ladsgroup.json
12:02 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
12:01 akosiaris@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7fa3b11fc520>
12:01 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2043.codfw.wmnet with OS bullseye
12:00 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2292 to wikikube-worker2043
12:00 akosiaris@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2043
11:59 akosiaris@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2043
11:59 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:59 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2292 to wikikube-worker2043 - akosiaris@cumin1002"
11:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 50%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67898 and previous config saved to /var/cache/conftool/dbconfig/20240827-115859-arnaudb.json
11:58 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2292 to wikikube-worker2043 - akosiaris@cumin1002"
11:57 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67897 and previous config saved to /var/cache/conftool/dbconfig/20240827-115711-arnaudb.json
11:54 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
11:53 akosiaris@cumin1002: START - Cookbook sre.hosts.rename from mw2292 to wikikube-worker2043
11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T370903)', diff saved to https://phabricator.wikimedia.org/P67896 and previous config saved to /var/cache/conftool/dbconfig/20240827-115318-ladsgroup.json
11:51 kart_: Updated cxserver to 2024-08-27-045705-production (T369815)
11:50 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
11:49 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
11:46 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
11:46 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
11:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T370903)', diff saved to https://phabricator.wikimedia.org/P67895 and previous config saved to /var/cache/conftool/dbconfig/20240827-114608-ladsgroup.json
11:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
11:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T370903)', diff saved to https://phabricator.wikimedia.org/P67894 and previous config saved to /var/cache/conftool/dbconfig/20240827-114546-ladsgroup.json
11:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 25%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67893 and previous config saved to /var/cache/conftool/dbconfig/20240827-114354-arnaudb.json
11:42 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67892 and previous config saved to /var/cache/conftool/dbconfig/20240827-114205-arnaudb.json
11:39 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus7001.magru.wmnet
11:38 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
11:38 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
11:33 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus7001.magru.wmnet
11:32 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2292.codfw.wmnet
11:31 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2292.codfw.wmnet
11:30 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus6002.drmrs.wmnet
11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P67891 and previous config saved to /var/cache/conftool/dbconfig/20240827-113039-ladsgroup.json
11:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 15%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67890 and previous config saved to /var/cache/conftool/dbconfig/20240827-112848-arnaudb.json
11:27 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67889 and previous config saved to /var/cache/conftool/dbconfig/20240827-112700-arnaudb.json
11:24 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus6002.drmrs.wmnet
11:20 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database cswikivoyage (T370912)
11:20 hashar@deploy1003: Finished scap sync-world: testwikis to 1.43.0-wmf.20 refs T366965 (duration: 47m 15s)
11:20 godog: start prometheus7001 bookworm upgrade - T326657
11:19 claime: Deleting misbehaving pod ipoid-production-daily-updates-28742340-h5ckx - T373427
11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P67887 and previous config saved to /var/cache/conftool/dbconfig/20240827-111532-ladsgroup.json
11:14 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-main2001.codfw.wmnet
11:14 jayme@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:14 jayme@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-main2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jayme@cumin1002"
11:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 5%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67886 and previous config saved to /var/cache/conftool/dbconfig/20240827-111343-arnaudb.json
11:13 jayme@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-main2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jayme@cumin1002"
11:12 godog: start prometheus6002 bookworm upgrade - T326657
11:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 16%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67885 and previous config saved to /var/cache/conftool/dbconfig/20240827-111154-arnaudb.json
11:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2030.codfw.wmnet
11:05 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2030.codfw.wmnet
11:00 Dreamy_Jazz: Starting MediaModeration time limited scan on group0 to make up monthly request limit - https://wikitech.wikimedia.org/wiki/MediaModeration
11:00 jayme@cumin1002: START - Cookbook sre.dns.netbox
11:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T370903)', diff saved to https://phabricator.wikimedia.org/P67884 and previous config saved to /var/cache/conftool/dbconfig/20240827-110024-ladsgroup.json
11:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2030.codfw.wmnet with OS bullseye
10:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 3%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67883 and previous config saved to /var/cache/conftool/dbconfig/20240827-105837-arnaudb.json
10:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T370903)', diff saved to https://phabricator.wikimedia.org/P67882 and previous config saved to /var/cache/conftool/dbconfig/20240827-105815-ladsgroup.json
10:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
10:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
10:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
10:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
10:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 8%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67881 and previous config saved to /var/cache/conftool/dbconfig/20240827-105649-arnaudb.json
10:55 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database cswikivoyage (T370912)
10:54 jayme@cumin1002: START - Cookbook sre.hosts.decommission for hosts kafka-main2001.codfw.wmnet
10:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2018.codfw.wmnet
10:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2018.codfw.wmnet
10:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2028.codfw.wmnet
10:48 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2028.codfw.wmnet
10:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2017.codfw.wmnet
10:48 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2017.codfw.wmnet
10:46 claime: Running homer 'lsw1-a6-codfw*' commit 'T372878'
10:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 2%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67880 and previous config saved to /var/cache/conftool/dbconfig/20240827-104332-arnaudb.json
10:43 claime: Running homer 'lsw1-a5-codfw*' commit 'T372878'
10:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 6%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67879 and previous config saved to /var/cache/conftool/dbconfig/20240827-104143-arnaudb.json
10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2018.codfw.wmnet with OS bullseye
10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2030.codfw.wmnet with reason: host reimage
10:36 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2030.codfw.wmnet with reason: host reimage
10:33 hashar@deploy1003: Started scap sync-world: testwikis to 1.43.0-wmf.20 refs T366965
10:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2028.codfw.wmnet with OS bullseye
10:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 1%: post maintenance', diff saved to https://phabricator.wikimedia.org/P67878 and previous config saved to /var/cache/conftool/dbconfig/20240827-102827-arnaudb.json
10:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 4%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67877 and previous config saved to /var/cache/conftool/dbconfig/20240827-102638-arnaudb.json
10:26 claime: homer 'cr*codfw*' commit 'T372878'
10:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2017.codfw.wmnet with OS bullseye
10:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2018.codfw.wmnet with reason: host reimage
10:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7fa8baa9bd90>
10:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2030
10:19 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2030
10:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2030.codfw.wmnet 177.0.192.10.in-addr.arpa 7.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
10:19 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2030.codfw.wmnet 177.0.192.10.in-addr.arpa 7.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
10:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2030 - cgoubert@cumin1002"
10:19 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2030 - cgoubert@cumin1002"
10:17 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2018.codfw.wmnet with reason: host reimage
10:16 hashar@deploy1003: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki=aawiki --force-version "1.43.0-wmf.20" --no-progress --store-class=LCStoreCDB --threads=22 --lang en --quiet ' returned non-zero exit status 1. (duration: 00m 02s)
10:16 hashar@deploy1003: Started scap sync-world: testwikis to 1.43.0-wmf.20 refs T366965
10:14 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:14 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7fa8baa9bd90>
10:14 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2030.codfw.wmnet with OS bullseye
10:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2124.codfw.wmnet with reason: replag
10:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2124.codfw.wmnet with reason: replag
10:13 hashar@deploy1003: scap failed: PermissionError [Errno 13] Permission denied: '/srv/mediawiki-staging/php-1.43.0-wmf.20/cache/gitinfo' (duration: 00m 00s)
10:13 hashar@deploy1003: Started scap sync-world: testwikis to 1.43.0-wmf.20 refs T366965
10:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2030.codfw.wmnet
10:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 2%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67876 and previous config saved to /var/cache/conftool/dbconfig/20240827-101132-arnaudb.json
10:11 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2030.codfw.wmnet
10:10 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
10:10 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2028.codfw.wmnet with reason: host reimage
10:09 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:07 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
10:07 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2028.codfw.wmnet with reason: host reimage
10:06 hashar@deploy1003: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki=aawiki --force-version "1.43.0-wmf.20" --no-progress --store-class=LCStoreCDB --threads=22 --lang en --quiet ' returned non-zero exit status 1. (duration: 00m 02s)
10:06 hashar@deploy1003: Started scap sync-world: testwikis to 1.43.0-wmf.20 refs T366965
10:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T371742)', diff saved to https://phabricator.wikimedia.org/P67875 and previous config saved to /var/cache/conftool/dbconfig/20240827-100548-ladsgroup.json
10:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1244.eqiad.wmnet with reason: Maintenance
10:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1244.eqiad.wmnet with reason: Maintenance
10:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T371742)', diff saved to https://phabricator.wikimedia.org/P67874 and previous config saved to /var/cache/conftool/dbconfig/20240827-100527-ladsgroup.json
10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2017.codfw.wmnet with reason: host reimage
10:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f65f7b4bd90>
10:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2018
10:01 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2017.codfw.wmnet with reason: host reimage
10:01 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2018
10:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2018.codfw.wmnet 95.0.192.10.in-addr.arpa 5.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
10:00 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2018.codfw.wmnet 95.0.192.10.in-addr.arpa 5.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
10:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2018 - cgoubert@cumin1002"
10:00 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2018 - cgoubert@cumin1002"
09:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: post upgrade repooling', diff saved to https://phabricator.wikimedia.org/P67873 and previous config saved to /var/cache/conftool/dbconfig/20240827-095627-arnaudb.json
09:53 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
09:53 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f65f7b4bd90>
09:52 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2018.codfw.wmnet with OS bullseye
09:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2018.codfw.wmnet
09:50 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2018.codfw.wmnet
09:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P67872 and previous config saved to /var/cache/conftool/dbconfig/20240827-095019-ladsgroup.json
09:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f6e24a10d30>
09:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2028
09:49 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2028
09:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2028.codfw.wmnet 178.0.192.10.in-addr.arpa 8.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
09:49 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2028.codfw.wmnet 178.0.192.10.in-addr.arpa 8.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
09:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2028 - cgoubert@cumin1002"
09:49 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2028 - cgoubert@cumin1002"
09:45 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
09:45 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f6e24a10d30>
09:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f057a31dd90>
09:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2017
09:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2028.codfw.wmnet with OS bullseye
09:43 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2017
09:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2017.codfw.wmnet 76.0.192.10.in-addr.arpa 6.7.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
09:43 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2017.codfw.wmnet 76.0.192.10.in-addr.arpa 6.7.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
09:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2017 - cgoubert@cumin1002"
09:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2017 - cgoubert@cumin1002"
09:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
09:40 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f057a31dd90>
09:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2028.codfw.wmnet
09:39 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2028.codfw.wmnet
09:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2017.codfw.wmnet with OS bullseye
09:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2017.codfw.wmnet
09:36 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2017.codfw.wmnet
09:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P67871 and previous config saved to /var/cache/conftool/dbconfig/20240827-093512-ladsgroup.json
09:32 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db2124.codfw.wmnet with reason: db2124 fix
09:32 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on db2124.codfw.wmnet with reason: db2124 fix
09:25 hashar: train: fast forwarded mediawiki/core wmf/1.43.0-wmf.20 from 1faf18d6570 to ef87455d7c3 # T366965
09:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T371742)', diff saved to https://phabricator.wikimedia.org/P67870 and previous config saved to /var/cache/conftool/dbconfig/20240827-092005-ladsgroup.json
09:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2114.codfw.wmnet
09:13 marostegui@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:13 marostegui@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2114.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
09:11 marostegui@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2114.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
09:08 marostegui@cumin1002: START - Cookbook sre.dns.netbox
09:04 tappof@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors (exit_code=0) rolling restart_daemons on P{O:logging::opensearch::collector and logstash*.codfw.wmnet} and (A:logstash-collector)
09:02 marostegui@cumin1002: START - Cookbook sre.hosts.decommission for hosts db2114.codfw.wmnet
09:01 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@8d2d8fe] (releasing): (no justification provided) (duration: 00m 48s)
09:01 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2232.codfw.wmnet with OS bookworm
09:00 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@8d2d8fe] (releasing): (no justification provided)
09:00 tappof@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors rolling restart_daemons on P{O:logging::opensearch::collector and logstash*.codfw.wmnet} and (A:logstash-collector)
08:55 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db2124', diff saved to https://phabricator.wikimedia.org/P67868 and previous config saved to /var/cache/conftool/dbconfig/20240827-085551-arnaudb.json
08:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db1161.eqiad.wmnet with reason: db1161 upgrade
08:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on db1161.eqiad.wmnet with reason: db1161 upgrade
08:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1161.eqiad.wmnet
08:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on an-redacteddb1001.eqiad.wmnet with reason: upgrading db1161
08:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on an-redacteddb1001.eqiad.wmnet with reason: upgrading db1161
08:44 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1154.eqiad.wmnet with reason: upgrading db1161
08:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1154.eqiad.wmnet with reason: upgrading db1161
08:39 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db1161.eqiad.wmnet
08:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1161.eqiad.wmnet with reason: db1161 upgrade
08:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1161.eqiad.wmnet with reason: db1161 upgrade
08:29 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db1161 - T373328', diff saved to https://phabricator.wikimedia.org/P67867 and previous config saved to /var/cache/conftool/dbconfig/20240827-082923-arnaudb.json
08:18 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2232.codfw.wmnet with OS bookworm
08:18 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2230.codfw.wmnet with OS bookworm
08:18 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2231.codfw.wmnet with OS bookworm
08:18 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2232.codfw.wmnet with OS bookworm
08:01 urbanecm: Clear throttle for 105.113.127.170 via resetAuthenticationThrottle.php (T373414)
08:00 urbanecm@deploy1003: Finished scap sync-world: Backport for Add throttle rule for Wikimedia Hausa edit-a-thon (T373414) (duration: 06m 42s)
07:53 urbanecm@deploy1003: Started scap sync-world: Backport for Add throttle rule for Wikimedia Hausa edit-a-thon (T373414)
07:50 godog: ack probedown for puppetmaster:8181 - T373369
07:49 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
07:45 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
07:26 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2231.codfw.wmnet with OS bookworm
07:26 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2230.codfw.wmnet with OS bookworm
07:22 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2232.codfw.wmnet with OS bookworm
07:12 kartik@deploy1003: Finished scap sync-world: Backport for Section Translation: Fix some language codes (duration: 08m 09s)
07:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T371742)', diff saved to https://phabricator.wikimedia.org/P67866 and previous config saved to /var/cache/conftool/dbconfig/20240827-070845-ladsgroup.json
07:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1243.eqiad.wmnet with reason: Maintenance
07:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1243.eqiad.wmnet with reason: Maintenance
07:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T371742)', diff saved to https://phabricator.wikimedia.org/P67865 and previous config saved to /var/cache/conftool/dbconfig/20240827-070823-ladsgroup.json
07:07 kartik@deploy1003: kartik: Continuing with sync
07:06 kartik@deploy1003: kartik: Backport for Section Translation: Fix some language codes synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:04 kartik@deploy1003: Started scap sync-world: Backport for Section Translation: Fix some language codes
06:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P67864 and previous config saved to /var/cache/conftool/dbconfig/20240827-065316-ladsgroup.json
06:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P67863 and previous config saved to /var/cache/conftool/dbconfig/20240827-063809-ladsgroup.json
06:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T371742)', diff saved to https://phabricator.wikimedia.org/P67862 and previous config saved to /var/cache/conftool/dbconfig/20240827-062302-ladsgroup.json
05:34 kcvelaga@deploy1003: Finished deploy [airflow-dags/analytics_product@0b23c91]: (no justification provided) (duration: 00m 18s)
05:33 kcvelaga@deploy1003: Started deploy [airflow-dags/analytics_product@0b23c91]: (no justification provided)
04:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T371742)', diff saved to https://phabricator.wikimedia.org/P67861 and previous config saved to /var/cache/conftool/dbconfig/20240827-041446-ladsgroup.json
04:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1242.eqiad.wmnet with reason: Maintenance
04:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1242.eqiad.wmnet with reason: Maintenance
04:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T371742)', diff saved to https://phabricator.wikimedia.org/P67860 and previous config saved to /var/cache/conftool/dbconfig/20240827-041424-ladsgroup.json
04:01 mwpresync@deploy1003: Pruned MediaWiki: 1.43.0-wmf.17 (duration: 01m 28s)
03:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P67859 and previous config saved to /var/cache/conftool/dbconfig/20240827-035916-ladsgroup.json
03:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P67858 and previous config saved to /var/cache/conftool/dbconfig/20240827-034409-ladsgroup.json
03:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T371742)', diff saved to https://phabricator.wikimedia.org/P67857 and previous config saved to /var/cache/conftool/dbconfig/20240827-032902-ladsgroup.json
02:49 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox
02:23 brett: Import corto 0.3-1 into bookworm-wikimedia apt archive
01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T371742)', diff saved to https://phabricator.wikimedia.org/P67856 and previous config saved to /var/cache/conftool/dbconfig/20240827-011527-ladsgroup.json
01:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1241.eqiad.wmnet with reason: Maintenance
01:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1241.eqiad.wmnet with reason: Maintenance
01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T371742)', diff saved to https://phabricator.wikimedia.org/P67855 and previous config saved to /var/cache/conftool/dbconfig/20240827-011505-ladsgroup.json
00:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P67854 and previous config saved to /var/cache/conftool/dbconfig/20240827-005958-ladsgroup.json
00:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P67853 and previous config saved to /var/cache/conftool/dbconfig/20240827-004451-ladsgroup.json
00:40 dduvall@deploy1003: Finished deploy [releng/jenkins-deploy@663c843] (releasing): (no justification provided) (duration: 00m 40s)
00:39 dduvall@deploy1003: Started deploy [releng/jenkins-deploy@663c843] (releasing): (no justification provided)
00:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T371742)', diff saved to https://phabricator.wikimedia.org/P67852 and previous config saved to /var/cache/conftool/dbconfig/20240827-002944-ladsgroup.json

2024-08-26

22:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T370903)', diff saved to https://phabricator.wikimedia.org/P67851 and previous config saved to /var/cache/conftool/dbconfig/20240826-225933-ladsgroup.json
22:51 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox
22:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P67850 and previous config saved to /var/cache/conftool/dbconfig/20240826-224426-ladsgroup.json
22:36 swfrench-wmf: running homer 'cr*codfw*' commit 'T372878' (remove old BGP session config for kubernetes2018, kubernetes2025)
22:29 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs-main.discovery.wmnet wdqs-scholarly.discovery.wmnet on all recursors
22:29 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs-main.discovery.wmnet wdqs-scholarly.discovery.wmnet on all recursors
22:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P67849 and previous config saved to /var/cache/conftool/dbconfig/20240826-222919-ladsgroup.json
22:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T371742)', diff saved to https://phabricator.wikimedia.org/P67848 and previous config saved to /var/cache/conftool/dbconfig/20240826-222351-ladsgroup.json
22:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1238.eqiad.wmnet with reason: Maintenance
22:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1238.eqiad.wmnet with reason: Maintenance
22:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T371742)', diff saved to https://phabricator.wikimedia.org/P67847 and previous config saved to /var/cache/conftool/dbconfig/20240826-222328-ladsgroup.json
22:17 zabe@deploy1003: Finished scap sync-world: Backport for Removing 'spamblacklistlog' right from usergroups (T367683) (duration: 06m 58s)
22:14 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2042.codfw.wmnet
22:14 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2042.codfw.wmnet
22:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T370903)', diff saved to https://phabricator.wikimedia.org/P67846 and previous config saved to /var/cache/conftool/dbconfig/20240826-221411-ladsgroup.json
22:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T370903)', diff saved to https://phabricator.wikimedia.org/P67845 and previous config saved to /var/cache/conftool/dbconfig/20240826-221302-ladsgroup.json
22:13 zabe@deploy1003: superpes, zabe: Continuing with sync
22:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2217.codfw.wmnet with reason: Maintenance
22:12 swfrench-wmf: ran homer 'lsw1-a8-codfw*' commit 'T372878'
22:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2217.codfw.wmnet with reason: Maintenance
22:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2197.codfw.wmnet with reason: Maintenance
22:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2197.codfw.wmnet with reason: Maintenance
22:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T370903)', diff saved to https://phabricator.wikimedia.org/P67844 and previous config saved to /var/cache/conftool/dbconfig/20240826-221245-ladsgroup.json
22:12 zabe@deploy1003: superpes, zabe: Backport for Removing 'spamblacklistlog' right from usergroups (T367683) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:10 zabe@deploy1003: Started scap sync-world: Backport for Removing 'spamblacklistlog' right from usergroups (T367683)
22:10 zabe@deploy1003: Finished scap sync-world: Backport for [sysop_plwiki] Change the logo/icon and the favicon (T368712), [arbcom_itwiki] Enable importing from itwiki (T369264) (duration: 07m 13s)
22:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P67843 and previous config saved to /var/cache/conftool/dbconfig/20240826-220821-ladsgroup.json
22:05 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2042.codfw.wmnet with OS bullseye
22:05 zabe@deploy1003: superpes, zabe: Continuing with sync
22:04 zabe@deploy1003: superpes, zabe: Backport for [sysop_plwiki] Change the logo/icon and the favicon (T368712), [arbcom_itwiki] Enable importing from itwiki (T369264) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:02 zabe@deploy1003: Started scap sync-world: Backport for [sysop_plwiki] Change the logo/icon and the favicon (T368712), [arbcom_itwiki] Enable importing from itwiki (T369264)
22:01 inflatador: bking@dns1004.wikimedia.org `sudo -i authdns-update` T364364
21:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P67842 and previous config saved to /var/cache/conftool/dbconfig/20240826-215738-ladsgroup.json
21:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P67841 and previous config saved to /var/cache/conftool/dbconfig/20240826-215314-ladsgroup.json
21:45 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2042.codfw.wmnet with reason: host reimage
21:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P67840 and previous config saved to /var/cache/conftool/dbconfig/20240826-214230-ladsgroup.json
21:41 swfrench@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2042.codfw.wmnet with reason: host reimage
21:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T371742)', diff saved to https://phabricator.wikimedia.org/P67839 and previous config saved to /var/cache/conftool/dbconfig/20240826-213807-ladsgroup.json
21:31 catrope@deploy1003: Finished scap sync-world: Backport for Revert "Activates the "compact" Parsoid indicator on all wikivoyage wikis" (duration: 31m 21s)
21:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T370903)', diff saved to https://phabricator.wikimedia.org/P67838 and previous config saved to /var/cache/conftool/dbconfig/20240826-212723-ladsgroup.json
21:27 catrope@deploy1003: catrope, trainbranchbot: Continuing with sync
21:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T370903)', diff saved to https://phabricator.wikimedia.org/P67837 and previous config saved to /var/cache/conftool/dbconfig/20240826-212513-ladsgroup.json
21:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2193.codfw.wmnet with reason: Maintenance
21:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2193.codfw.wmnet with reason: Maintenance
21:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T370903)', diff saved to https://phabricator.wikimedia.org/P67836 and previous config saved to /var/cache/conftool/dbconfig/20240826-212458-ladsgroup.json
21:24 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7fe3cb9c1700>
21:24 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2042
21:23 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2042
21:23 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2042.codfw.wmnet 20.0.192.10.in-addr.arpa 0.2.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
21:23 swfrench@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-worker2042.codfw.wmnet 20.0.192.10.in-addr.arpa 0.2.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
21:23 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:23 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2042 - swfrench@cumin2002"
21:23 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2042 - swfrench@cumin2002"
21:18 swfrench@cumin2002: START - Cookbook sre.dns.netbox
21:17 swfrench@cumin2002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7fe3cb9c1700>
21:17 swfrench@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2042.codfw.wmnet with OS bullseye
21:16 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2042.codfw.wmnet on all recursors
21:16 swfrench@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-worker2042.codfw.wmnet on all recursors
21:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P67835 and previous config saved to /var/cache/conftool/dbconfig/20240826-210951-ladsgroup.json
21:09 swfrench@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2025 to wikikube-worker2042
21:08 swfrench@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2042
21:08 swfrench@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2042
21:08 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:08 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2025 to wikikube-worker2042 - swfrench@cumin2002"
21:07 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2025 to wikikube-worker2042 - swfrench@cumin2002"
21:02 catrope@deploy1003: catrope, trainbranchbot: Backport for Revert "Activates the "compact" Parsoid indicator on all wikivoyage wikis" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:02 swfrench@cumin2002: START - Cookbook sre.dns.netbox
21:01 swfrench@cumin2002: START - Cookbook sre.hosts.rename from kubernetes2025 to wikikube-worker2042
21:00 catrope@deploy1003: Started scap sync-world: Backport for Revert "Activates the "compact" Parsoid indicator on all wikivoyage wikis"
20:58 catrope@deploy1003: Sync cancelled.
20:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P67834 and previous config saved to /var/cache/conftool/dbconfig/20240826-205443-ladsgroup.json
20:52 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2025.codfw.wmnet
20:51 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2025.codfw.wmnet
20:47 catrope@deploy1003: catrope, cscott: Backport for Activates the "compact" Parsoid indicator on all wikivoyage wikis (T372789) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:44 catrope@deploy1003: Started scap sync-world: Backport for Activates the "compact" Parsoid indicator on all wikivoyage wikis (T372789)
20:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T370903)', diff saved to https://phabricator.wikimedia.org/P67833 and previous config saved to /var/cache/conftool/dbconfig/20240826-203936-ladsgroup.json
20:39 catrope@deploy1003: Finished scap sync-world: Backport for Add Chart extension, enable in beta cluster (T369945) (duration: 29m 57s)
20:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T370903)', diff saved to https://phabricator.wikimedia.org/P67832 and previous config saved to /var/cache/conftool/dbconfig/20240826-203726-ladsgroup.json
20:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2180.codfw.wmnet with reason: Maintenance
20:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2180.codfw.wmnet with reason: Maintenance
20:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T370903)', diff saved to https://phabricator.wikimedia.org/P67831 and previous config saved to /var/cache/conftool/dbconfig/20240826-203715-ladsgroup.json
20:28 catrope@deploy1003: catrope: Continuing with sync
20:28 catrope@deploy1003: catrope: Backport for Add Chart extension, enable in beta cluster (T369945) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P67829 and previous config saved to /var/cache/conftool/dbconfig/20240826-202208-ladsgroup.json
20:21 ryankemper@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main
20:20 ryankemper@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-scholarly
20:09 catrope@deploy1003: Started scap sync-world: Backport for Add Chart extension, enable in beta cluster (T369945)
20:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P67827 and previous config saved to /var/cache/conftool/dbconfig/20240826-200701-ladsgroup.json
20:05 kamila@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2041.codfw.wmnet
20:05 kamila@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2041.codfw.wmnet
20:05 kamila_: run homer to add wikikube-worker2041 T372878
19:59 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2041.codfw.wmnet with OS bullseye
19:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T370903)', diff saved to https://phabricator.wikimedia.org/P67826 and previous config saved to /var/cache/conftool/dbconfig/20240826-195153-ladsgroup.json
19:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T370903)', diff saved to https://phabricator.wikimedia.org/P67825 and previous config saved to /var/cache/conftool/dbconfig/20240826-194944-ladsgroup.json
19:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
19:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
19:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T370903)', diff saved to https://phabricator.wikimedia.org/P67824 and previous config saved to /var/cache/conftool/dbconfig/20240826-194933-ladsgroup.json
19:48 ryankemper@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-scholarly
19:48 ryankemper@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main
19:48 ryankemper: T364368 Manually adding dns discovery resources to etcd corresponding to https://wikitech.wikimedia.org/wiki/LVS#Add_the_DNS_Discovery_Record
19:45 ryankemper: T364368 Merged patch to add dns discovery resources for `wdqs-main` and `wdqs-scholarly` (https://gerrit.wikimedia.org/r/c/operations/dns/+/1064831), and ran puppet on all DNS hosts
19:43 ryankemper: T364368 Merged patch to move lvs state to `production` for `wdqs-main` and `wdqs-scholarly` (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1064848) and ran puppet on all LVS hosts
19:42 ryankemper: T364368 [codfw] `sudo ipvsadm -L -n` on lvs primary looks good, all done with lvs restarts
19:39 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2041.codfw.wmnet with reason: host reimage
19:36 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2041.codfw.wmnet with reason: host reimage
19:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P67823 and previous config saved to /var/cache/conftool/dbconfig/20240826-193425-ladsgroup.json
19:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T371742)', diff saved to https://phabricator.wikimedia.org/P67822 and previous config saved to /var/cache/conftool/dbconfig/20240826-193032-ladsgroup.json
19:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
19:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
19:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1221.eqiad.wmnet with reason: Maintenance
19:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1221.eqiad.wmnet with reason: Maintenance
19:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T371742)', diff saved to https://phabricator.wikimedia.org/P67821 and previous config saved to /var/cache/conftool/dbconfig/20240826-193003-ladsgroup.json
19:25 ryankemper: T364368 [codfw] `sudo ipvsadm -L -n` on lvs primary looks good, all done with lvs restarts
19:24 sukhe: sukhe@alert1001:~$ sudo systemctl restart ircecho.service
19:24 ryankemper: T364368 [codfw] Restarted lvs primary: `sudo cumin 'A:lvs-low-traffic-codfw' 'systemctl restart pybal.service'`
19:23 ryankemper: T364368 [codfw] `sudo ipvsadm -L -n` on lvs secondary looks good, proceeding
19:21 ryankemper: T280001 [codfw] Restarted lvs secondary: `sudo cumin 'A:lvs-secondary-codfw' 'systemctl restart pybal.service'`
19:20 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f58347ff5e0>
19:20 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2041
19:20 ryankemper: T280001 [codfw] ran puppet on codfw lvs hosts, expecting alerts soon
19:20 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2041
19:20 kamila@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2041.codfw.wmnet 125.0.192.10.in-addr.arpa 5.2.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
19:20 kamila@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2041.codfw.wmnet 125.0.192.10.in-addr.arpa 5.2.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
19:20 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:20 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2041 - kamila@cumin1002"
19:20 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2041 - kamila@cumin1002"
19:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P67820 and previous config saved to /var/cache/conftool/dbconfig/20240826-191917-ladsgroup.json
19:16 ryankemper: T280001 [eqiad] `sudo ipvsadm -L -n` on lvs primary looks good, proceeding
19:16 ryankemper: T280001 [eqiad] Restarted lvs primary: `sudo cumin 'A:lvs-low-traffic-eqiad' 'systemctl restart pybal.service'`
19:15 kamila@cumin1002: START - Cookbook sre.dns.netbox
19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P67819 and previous config saved to /var/cache/conftool/dbconfig/20240826-191456-ladsgroup.json
19:14 kamila@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f58347ff5e0>
19:14 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2041.codfw.wmnet with OS bullseye
19:13 ryankemper: T280001 [eqiad] `sudo ipvsadm -L -n` on lvs secondary looks good, proceeding
19:13 ryankemper: T280001 [eqiad] Restarted lvs secondary: `sudo cumin 'A:lvs-secondary-eqiad' 'systemctl restart pybal.service'`
19:12 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2018 to wikikube-worker2041
19:12 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2041
19:12 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2041
19:11 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:11 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2018 to wikikube-worker2041 - kamila@cumin1002"
19:11 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2018 to wikikube-worker2041 - kamila@cumin1002"
19:07 kamila@cumin1002: START - Cookbook sre.dns.netbox
19:07 kamila@cumin1002: START - Cookbook sre.hosts.rename from kubernetes2018 to wikikube-worker2041
19:06 ryankemper: T280001 [eqiad] enabled puppet on eqiad lvs hosts, expecting alerts soon
19:05 ryankemper: T280001 Disabled puppet on all lvs hosts in preparation for rolling restart
19:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T370903)', diff saved to https://phabricator.wikimedia.org/P67818 and previous config saved to /var/cache/conftool/dbconfig/20240826-190411-ladsgroup.json
19:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T370903)', diff saved to https://phabricator.wikimedia.org/P67817 and previous config saved to /var/cache/conftool/dbconfig/20240826-190201-ladsgroup.json
19:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
19:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
19:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2158.codfw.wmnet with reason: Maintenance
19:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2158.codfw.wmnet with reason: Maintenance
19:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T370903)', diff saved to https://phabricator.wikimedia.org/P67816 and previous config saved to /var/cache/conftool/dbconfig/20240826-190145-ladsgroup.json
18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P67815 and previous config saved to /var/cache/conftool/dbconfig/20240826-185948-ladsgroup.json
18:50 ryankemper@cumin2002: conftool action : set/pooled=no:weight=10; selector: name=wdqs1023*
18:48 cstone: payments-wiki upgraded from 2551f261 to 0455b791
18:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P67814 and previous config saved to /var/cache/conftool/dbconfig/20240826-184638-ladsgroup.json
18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T371742)', diff saved to https://phabricator.wikimedia.org/P67813 and previous config saved to /var/cache/conftool/dbconfig/20240826-184441-ladsgroup.json
18:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P67812 and previous config saved to /var/cache/conftool/dbconfig/20240826-183131-ladsgroup.json
18:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T370903)', diff saved to https://phabricator.wikimedia.org/P67811 and previous config saved to /var/cache/conftool/dbconfig/20240826-181624-ladsgroup.json
18:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T370903)', diff saved to https://phabricator.wikimedia.org/P67810 and previous config saved to /var/cache/conftool/dbconfig/20240826-181414-ladsgroup.json
18:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2151.codfw.wmnet with reason: Maintenance
18:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2151.codfw.wmnet with reason: Maintenance
18:11 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
18:09 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
18:09 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
18:08 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
17:53 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
17:52 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
17:52 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
17:51 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams: apply
17:43 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: cluster=wdqs-main
17:43 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: cluster=wdqs-scholarly
17:41 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
17:41 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
17:40 kamila@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2018.codfw.wmnet
17:40 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
17:39 kamila@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2018.codfw.wmnet
17:39 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: apply
17:39 ryankemper: T364364 Created PTR & A records for new graph split services `wdqs-main` and `wdqs-scholarly` (merged https://gerrit.wikimedia.org/r/c/operations/dns/+/1051446 and ran `sudo authdns-update` on `dns1004.wikimedia.org`)
17:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 11 hosts with reason: Maintenance
17:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on 11 hosts with reason: Maintenance
17:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
17:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
17:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T370903)', diff saved to https://phabricator.wikimedia.org/P67809 and previous config saved to /var/cache/conftool/dbconfig/20240826-172250-ladsgroup.json
17:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P67808 and previous config saved to /var/cache/conftool/dbconfig/20240826-170742-ladsgroup.json
16:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2035.codfw.wmnet
16:54 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2035.codfw.wmnet
16:53 claime: homer 'lsw1-b8-codfw*' commit T372878
16:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2035.codfw.wmnet with OS bullseye
16:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P67807 and previous config saved to /var/cache/conftool/dbconfig/20240826-165235-ladsgroup.json
16:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T370903)', diff saved to https://phabricator.wikimedia.org/P67806 and previous config saved to /var/cache/conftool/dbconfig/20240826-163728-ladsgroup.json
16:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2035.codfw.wmnet with reason: host reimage
16:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T370903)', diff saved to https://phabricator.wikimedia.org/P67805 and previous config saved to /var/cache/conftool/dbconfig/20240826-163032-ladsgroup.json
16:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2124.codfw.wmnet with reason: Maintenance
16:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2124.codfw.wmnet with reason: Maintenance
16:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:29 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2035.codfw.wmnet with reason: host reimage
16:28 claime: homer 'cr*codfw*' commit 'T372878'
16:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
16:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
16:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T370903)', diff saved to https://phabricator.wikimedia.org/P67804 and previous config saved to /var/cache/conftool/dbconfig/20240826-162553-ladsgroup.json
16:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T371742)', diff saved to https://phabricator.wikimedia.org/P67803 and previous config saved to /var/cache/conftool/dbconfig/20240826-162544-ladsgroup.json
16:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1199.eqiad.wmnet with reason: Maintenance
16:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1199.eqiad.wmnet with reason: Maintenance
16:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T371742)', diff saved to https://phabricator.wikimedia.org/P67802 and previous config saved to /var/cache/conftool/dbconfig/20240826-162522-ladsgroup.json
16:13 dancy@deploy1003: Stopping before sync operations
16:13 dancy@deploy1003: Started scap sync-world: testing
16:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P67801 and previous config saved to /var/cache/conftool/dbconfig/20240826-161039-ladsgroup.json
16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f6bc9767d90>
16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2035
16:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P67800 and previous config saved to /var/cache/conftool/dbconfig/20240826-161015-ladsgroup.json
16:10 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2035
16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2035.codfw.wmnet 62.16.192.10.in-addr.arpa 2.6.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:10 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2035.codfw.wmnet 62.16.192.10.in-addr.arpa 2.6.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2035 - cgoubert@cumin1002"
16:10 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2035 - cgoubert@cumin1002"
16:06 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
16:05 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f6bc9767d90>
16:04 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2035.codfw.wmnet with OS bullseye
16:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2001.codfw.wmnet
16:03 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2001.codfw.wmnet
16:01 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2001.codfw.wmnet with OS bullseye
16:01 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2001.codfw.wmnet with OS bullseye
15:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2035.codfw.wmnet
15:57 jdrewniak@deploy1003: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 02m 14s)
15:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2001.codfw.wmnet
15:57 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2035.codfw.wmnet
15:56 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2001.codfw.wmnet
15:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P67799 and previous config saved to /var/cache/conftool/dbconfig/20240826-155531-ladsgroup.json
15:55 jdrewniak@deploy1003: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 09m 39s)
15:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P67798 and previous config saved to /var/cache/conftool/dbconfig/20240826-155507-ladsgroup.json
15:47 sukhe: finished upgrading A:cp-eqsin to ATS 9.2.5: T339134
15:47 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-eqsin and A:cp for 9.2.5-1wm2
15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T370903)', diff saved to https://phabricator.wikimedia.org/P67797 and previous config saved to /var/cache/conftool/dbconfig/20240826-154024-ladsgroup.json
15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T371742)', diff saved to https://phabricator.wikimedia.org/P67796 and previous config saved to /var/cache/conftool/dbconfig/20240826-154000-ladsgroup.json
15:37 jan_drewniak: starting Wikimedia Portals Update. https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1066804
15:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T370903)', diff saved to https://phabricator.wikimedia.org/P67795 and previous config saved to /var/cache/conftool/dbconfig/20240826-153415-ladsgroup.json
15:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1231.eqiad.wmnet with reason: Maintenance
15:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1231.eqiad.wmnet with reason: Maintenance
15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2014.codfw.wmnet
15:29 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2014.codfw.wmnet
15:28 claime: homer 'lsw1-a5-codfw*' commit 'T372878'
15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2014.codfw.wmnet with OS bullseye
15:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1225.eqiad.wmnet with reason: Maintenance
15:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1225.eqiad.wmnet with reason: Maintenance
15:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T370903)', diff saved to https://phabricator.wikimedia.org/P67794 and previous config saved to /var/cache/conftool/dbconfig/20240826-152715-ladsgroup.json
15:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P67793 and previous config saved to /var/cache/conftool/dbconfig/20240826-151207-ladsgroup.json
15:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2014.codfw.wmnet with reason: host reimage
15:06 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2014.codfw.wmnet with reason: host reimage
15:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2008.codfw.wmnet
15:03 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2008.codfw.wmnet
15:02 claime: homer 'lsw1-b6-codfw*' commit T372878
15:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host rpki2003.codfw.wmnet
15:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rpki2003.codfw.wmnet with OS bookworm
15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2008.codfw.wmnet with OS bullseye
14:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P67792 and previous config saved to /var/cache/conftool/dbconfig/20240826-145700-ladsgroup.json
14:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2034.codfw.wmnet
14:56 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2034.codfw.wmnet
14:55 claime: homer 'lsw1-a3-codfw*' commit T372878
14:54 claime: homer 'lsw-a3-codfw*' commit T372878
14:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2034.codfw.wmnet with OS bullseye
14:50 claime: Running homer 'cr*codfw*' commit 'T372878'
14:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f15affd4d00>
14:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2014
14:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2004.codfw.wmnet
14:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2004.codfw.wmnet
14:49 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2014
14:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2014.codfw.wmnet 70.0.192.10.in-addr.arpa 0.7.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
14:49 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2014.codfw.wmnet 70.0.192.10.in-addr.arpa 0.7.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
14:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2014 - cgoubert@cumin1002"
14:49 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2014 - cgoubert@cumin1002"
14:47 claime: homer 'lsw1-b3-codfw*' commit T372878
14:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2004.codfw.wmnet with OS bullseye
14:45 dancy@deploy1003: Installation of scap version "4.100.0" completed for 211 hosts
14:45 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:44 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f15affd4d00>
14:44 dancy@deploy1003: Installing scap version "4.100.0" for 211 hosts
14:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2014.codfw.wmnet with OS bullseye
14:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2014.codfw.wmnet
14:41 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2014.codfw.wmnet
14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T370903)', diff saved to https://phabricator.wikimedia.org/P67791 and previous config saved to /var/cache/conftool/dbconfig/20240826-144153-ladsgroup.json
14:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2013.codfw.wmnet
14:41 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2013.codfw.wmnet
14:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2008.codfw.wmnet with reason: host reimage
14:40 claime: homer 'lsw1-a5-codfw*' commit 'T372878'
14:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2013.codfw.wmnet with OS bullseye
14:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T370903)', diff saved to https://phabricator.wikimedia.org/P67790 and previous config saved to /var/cache/conftool/dbconfig/20240826-143844-ladsgroup.json
14:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1187.eqiad.wmnet with reason: Maintenance
14:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1187.eqiad.wmnet with reason: Maintenance
14:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T370903)', diff saved to https://phabricator.wikimedia.org/P67789 and previous config saved to /var/cache/conftool/dbconfig/20240826-143822-ladsgroup.json
14:37 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2008.codfw.wmnet with reason: host reimage
14:36 Dreamy_Jazz: Started 6hr maximum scan on group2 - https://wikitech.wikimedia.org/wiki/MediaModeration
14:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2034.codfw.wmnet with reason: host reimage
14:31 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2034.codfw.wmnet with reason: host reimage
14:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2004.codfw.wmnet with reason: host reimage
14:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2004.codfw.wmnet with reason: host reimage
14:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P67788 and previous config saved to /var/cache/conftool/dbconfig/20240826-142315-ladsgroup.json
14:21 claime: Running homer 'cr*codfw*' commit 'T372878'
14:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f33b17ddd90>
14:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2008
14:20 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2008
14:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2008.codfw.wmnet 196.16.192.10.in-addr.arpa 6.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
14:20 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2008.codfw.wmnet 196.16.192.10.in-addr.arpa 6.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
14:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2008 - cgoubert@cumin1002"
14:20 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2008 - cgoubert@cumin1002"
14:20 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3003.esams.wmnet
14:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2013.codfw.wmnet with reason: host reimage
14:17 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:17 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f33b17ddd90>
14:16 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2008.codfw.wmnet with OS bullseye
14:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2008.codfw.wmnet
14:03 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:03 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f13a466bd60>
14:02 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2004.codfw.wmnet with OS bullseye
14:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2004.codfw.wmnet
14:00 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2004.codfw.wmnet
14:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2004.codfw.wmnet
14:00 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2004.codfw.wmnet
13:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7fc4fbcc0d30>
13:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2013
13:59 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2013
13:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2013.codfw.wmnet 68.0.192.10.in-addr.arpa 8.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
13:59 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2013.codfw.wmnet 68.0.192.10.in-addr.arpa 8.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
13:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2013 - cgoubert@cumin1002"
13:56 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2013 - cgoubert@cumin1002"
13:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rpki2003.codfw.wmnet with reason: host reimage
13:53 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
13:53 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7fc4fbcc0d30>
13:53 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2013.codfw.wmnet with OS bullseye
13:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T370903)', diff saved to https://phabricator.wikimedia.org/P67786 and previous config saved to /var/cache/conftool/dbconfig/20240826-135301-ladsgroup.json
13:52 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on rpki2003.codfw.wmnet with reason: host reimage
13:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2013.codfw.wmnet
13:51 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2013.codfw.wmnet
13:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T370903)', diff saved to https://phabricator.wikimedia.org/P67785 and previous config saved to /var/cache/conftool/dbconfig/20240826-135052-ladsgroup.json
13:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
13:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
13:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T370903)', diff saved to https://phabricator.wikimedia.org/P67784 and previous config saved to /var/cache/conftool/dbconfig/20240826-135031-ladsgroup.json
13:45 urbanecm@deploy1003: Finished scap sync-world: Backport for use shellbox-video globally (adding group2, including commons) (T356241) (duration: 08m 04s)
13:45 Dreamy_Jazz: Started 6hr maximum scan on nowiki - https://wikitech.wikimedia.org/wiki/MediaModeration
13:41 urbanecm@deploy1003: hnowlan, urbanecm: Continuing with sync
13:40 urbanecm@deploy1003: hnowlan, urbanecm: Backport for use shellbox-video globally (adding group2, including commons) (T356241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:37 urbanecm@deploy1003: Started scap sync-world: Backport for use shellbox-video globally (adding group2, including commons) (T356241)
13:36 urbanecm@deploy1003: Finished scap sync-world: Backport for Rollout Parsoid Kartographer support on all wikis (T342871), scripts: add script for running jobs from stdin rather than http (T369048) (duration: 26m 53s)
13:35 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host rpki2003.codfw.wmnet with OS bookworm
13:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P67783 and previous config saved to /var/cache/conftool/dbconfig/20240826-133524-ladsgroup.json
13:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM rpki2003.codfw.wmnet - ayounsi@cumin1002"
13:34 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM rpki2003.codfw.wmnet - ayounsi@cumin1002"
13:34 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-eqsin and A:cp for 9.2.5-1wm2
13:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) rpki2003.codfw.wmnet on all recursors
13:34 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache rpki2003.codfw.wmnet on all recursors
13:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM rpki2003.codfw.wmnet - ayounsi@cumin1002"
13:34 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM rpki2003.codfw.wmnet - ayounsi@cumin1002"
13:30 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
13:30 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host rpki2003.codfw.wmnet
13:29 ayounsi@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host rpki2003.codfw.wmnet
13:29 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host rpki2003.codfw.wmnet
13:28 urbanecm@deploy1003: hnowlan, urbanecm, ihurbain: Continuing with sync
13:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T371742)', diff saved to https://phabricator.wikimedia.org/P67782 and previous config saved to /var/cache/conftool/dbconfig/20240826-132738-ladsgroup.json
13:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1190.eqiad.wmnet with reason: Maintenance
13:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1190.eqiad.wmnet with reason: Maintenance
13:24 urbanecm@deploy1003: hnowlan, urbanecm, ihurbain: Backport for Rollout Parsoid Kartographer support on all wikis (T342871), scripts: add script for running jobs from stdin rather than http (T369048) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P67781 and previous config saved to /var/cache/conftool/dbconfig/20240826-132016-ladsgroup.json
13:09 urbanecm@deploy1003: Started scap sync-world: Backport for Rollout Parsoid Kartographer support on all wikis (T342871), scripts: add script for running jobs from stdin rather than http (T369048)
13:07 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
13:06 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
13:06 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
13:05 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
13:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T370903)', diff saved to https://phabricator.wikimedia.org/P67780 and previous config saved to /var/cache/conftool/dbconfig/20240826-130510-ladsgroup.json
13:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T370903)', diff saved to https://phabricator.wikimedia.org/P67779 and previous config saved to /var/cache/conftool/dbconfig/20240826-130401-ladsgroup.json
13:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
13:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
13:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T370903)', diff saved to https://phabricator.wikimedia.org/P67778 and previous config saved to /var/cache/conftool/dbconfig/20240826-130350-ladsgroup.json
12:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P67777 and previous config saved to /var/cache/conftool/dbconfig/20240826-124843-ladsgroup.json
12:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P67776 and previous config saved to /var/cache/conftool/dbconfig/20240826-123336-ladsgroup.json
12:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Weight db2214 T373174', diff saved to https://phabricator.wikimedia.org/P67775 and previous config saved to /var/cache/conftool/dbconfig/20240826-123205-arnaudb.json
12:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db2129 to s6 primary T373174', diff saved to https://phabricator.wikimedia.org/P67774 and previous config saved to /var/cache/conftool/dbconfig/20240826-122925-arnaudb.json
12:28 arnaudb: Starting s6 codfw failover from db2214 to db2129 - T373174
12:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: Testing
12:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: Testing
12:21 godog: move to /root unused and about to expire cert on puppetmaster1001:/var/lib/puppet/server/ssl/ca/signed/webperf.discovery.wmnet.pem
12:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T370903)', diff saved to https://phabricator.wikimedia.org/P67773 and previous config saved to /var/cache/conftool/dbconfig/20240826-121828-ladsgroup.json
12:18 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268434
12:17 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 268434
12:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 263903
12:17 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 263903
12:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61754
12:17 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61754
12:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 269115
12:16 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 269115
12:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 274607
12:16 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 274607
12:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T370903)', diff saved to https://phabricator.wikimedia.org/P67772 and previous config saved to /var/cache/conftool/dbconfig/20240826-121419-ladsgroup.json
12:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
12:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
12:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T370903)', diff saved to https://phabricator.wikimedia.org/P67771 and previous config saved to /var/cache/conftool/dbconfig/20240826-121408-ladsgroup.json
12:12 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
12:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db2129 with weight 0 T373174', diff saved to https://phabricator.wikimedia.org/P67770 and previous config saved to /var/cache/conftool/dbconfig/20240826-120921-arnaudb.json
12:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s6 T373174
12:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s6 T373174
12:05 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P67769 and previous config saved to /var/cache/conftool/dbconfig/20240826-115901-ladsgroup.json
11:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
11:53 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
11:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P67768 and previous config saved to /var/cache/conftool/dbconfig/20240826-114354-ladsgroup.json
11:41 hashar@deploy1003: Finished deploy [integration/docroot@c3352dd]: build: update mediawiki/mediawiki-codesniffer to 44.0.0 and micromatch to 4.0.8 (duration: 00m 06s)
11:41 hashar@deploy1003: Started deploy [integration/docroot@c3352dd]: build: update mediawiki/mediawiki-codesniffer to 44.0.0 and micromatch to 4.0.8
11:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
11:29 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
11:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T370903)', diff saved to https://phabricator.wikimedia.org/P67767 and previous config saved to /var/cache/conftool/dbconfig/20240826-112847-ladsgroup.json
11:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T370903)', diff saved to https://phabricator.wikimedia.org/P67766 and previous config saved to /var/cache/conftool/dbconfig/20240826-112739-ladsgroup.json
11:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
11:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
11:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
11:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
11:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
11:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
11:16 vgutierrez@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
11:13 vgutierrez@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
10:46 Dreamy_Jazz: Started a maximum 6 hr scan on ruwiki for MediaModeration - https://wikitech.wikimedia.org/wiki/MediaModeration
10:39 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
10:00 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
09:59 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
09:47 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
09:45 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
09:43 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
09:42 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
09:42 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
09:40 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
09:37 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp-test1003.wikimedia.org
09:37 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:37 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
09:36 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
09:33 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
09:28 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts idp-test1003.wikimedia.org
09:27 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp1003.wikimedia.org
09:27 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:27 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
09:25 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
09:22 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
09:17 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts idp1003.wikimedia.org
08:56 arnaudb@cumin1002: dbctl commit (dc=all): 'weight db2212 T373173', diff saved to https://phabricator.wikimedia.org/P67763 and previous config saved to /var/cache/conftool/dbconfig/20240826-085621-arnaudb.json
08:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db2203 to s1 primary T373173', diff saved to https://phabricator.wikimedia.org/P67762 and previous config saved to /var/cache/conftool/dbconfig/20240826-085048-arnaudb.json
08:50 arnaudb: Starting s1 codfw failover from db2212 to db2203 - T373173
08:49 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp2003.wikimedia.org
08:49 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:49 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
08:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T373173 - repeat due to T373295
08:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T373173 - repeat due to T373295
08:48 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
08:45 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
08:40 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts idp2003.wikimedia.org
08:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Primary switchover s1 node in failure
08:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Primary switchover s1 node in failure
08:17 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 depool', diff saved to https://phabricator.wikimedia.org/P67760 and previous config saved to /var/cache/conftool/dbconfig/20240826-081753-arnaudb.json
07:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db2203 with weight 0 T373173', diff saved to https://phabricator.wikimedia.org/P67758 and previous config saved to /var/cache/conftool/dbconfig/20240826-074113-arnaudb.json
07:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T373173
07:40 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T373173
07:21 arnaudb@cumin1002: dbctl commit (dc=all): 'rebalance weights T373168', diff saved to https://phabricator.wikimedia.org/P67757 and previous config saved to /var/cache/conftool/dbconfig/20240826-072119-arnaudb.json
07:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote es2039 to es7 primary and set section read-write T373168', diff saved to https://phabricator.wikimedia.org/P67756 and previous config saved to /var/cache/conftool/dbconfig/20240826-072028-arnaudb.json
07:19 arnaudb: Starting es7 codfw failover from es2038 to es2039 - T373168
07:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Set es2039 with weight 0 T373168', diff saved to https://phabricator.wikimedia.org/P67755 and previous config saved to /var/cache/conftool/dbconfig/20240826-071504-arnaudb.json
07:14 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es7 T373168
07:14 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es7 T373168
06:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32934
06:08 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 32934

2024-08-25

15:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T367856)', diff saved to https://phabricator.wikimedia.org/P67754 and previous config saved to /var/cache/conftool/dbconfig/20240825-153206-marostegui.json
15:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 7:00:00 on db2165.codfw.wmnet with reason: Maintenance
15:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 7:00:00 on db2165.codfw.wmnet with reason: Maintenance
15:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T367856)', diff saved to https://phabricator.wikimedia.org/P67753 and previous config saved to /var/cache/conftool/dbconfig/20240825-153144-marostegui.json
15:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P67752 and previous config saved to /var/cache/conftool/dbconfig/20240825-151637-marostegui.json
15:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P67751 and previous config saved to /var/cache/conftool/dbconfig/20240825-150130-marostegui.json
14:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T367856)', diff saved to https://phabricator.wikimedia.org/P67750 and previous config saved to /var/cache/conftool/dbconfig/20240825-144623-marostegui.json
08:05 oblivian@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Replication fixed', diff saved to https://phabricator.wikimedia.org/P67749 and previous config saved to /var/cache/conftool/dbconfig/20240825-080544-oblivian.json
07:50 oblivian@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Replication fixed', diff saved to https://phabricator.wikimedia.org/P67748 and previous config saved to /var/cache/conftool/dbconfig/20240825-075038-oblivian.json
07:35 oblivian@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Replication fixed', diff saved to https://phabricator.wikimedia.org/P67747 and previous config saved to /var/cache/conftool/dbconfig/20240825-073533-oblivian.json
07:20 oblivian@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Replication fixed', diff saved to https://phabricator.wikimedia.org/P67746 and previous config saved to /var/cache/conftool/dbconfig/20240825-072027-oblivian.json
07:05 oblivian@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Replication fixed', diff saved to https://phabricator.wikimedia.org/P67745 and previous config saved to /var/cache/conftool/dbconfig/20240825-070522-oblivian.json
06:57 _joe_: repairing mgwiktionary.pagelinks on db1161
06:12 oblivian@cumin1002: dbctl commit (dc=all): 'depooling db1161, broken replica', diff saved to https://phabricator.wikimedia.org/P67744 and previous config saved to /var/cache/conftool/dbconfig/20240825-061206-oblivian.json

2024-08-24

22:13 ejegg: civicrm upgraded from 75c86184 to f70d753c

2024-08-23

22:26 eileen: civicrm upgraded from e629834c to 75c86184 (that didn't turn out to have anything relevant to the new deduper error)
16:50 conniecc1@deploy1003: Finished deploy [airflow-dags/analytics_product@c55c7de]: (no justification provided) (duration: 00m 03s)
16:50 conniecc1@deploy1003: Started deploy [airflow-dags/analytics_product@c55c7de]: (no justification provided)
16:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T371742)', diff saved to https://phabricator.wikimedia.org/P67740 and previous config saved to /var/cache/conftool/dbconfig/20240823-164554-ladsgroup.json
16:45 nettrom@deploy1003: Finished deploy [airflow-dags/analytics_product@c55c7de]: (no justification provided) (duration: 00m 17s)
16:45 nettrom@deploy1003: Started deploy [airflow-dags/analytics_product@c55c7de]: (no justification provided)
16:37 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating mgmt for frack servers in codfw - jhancock@cumin2002"
16:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating mgmt for frack servers in codfw - jhancock@cumin2002"
16:34 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:33 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
16:32 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
16:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P67738 and previous config saved to /var/cache/conftool/dbconfig/20240823-163047-ladsgroup.json
16:19 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1001.eqiad.wmnet with OS bookworm
16:16 bearloga@deploy1003: Finished deploy [airflow-dags/wmde@c55c7de]: (no justification provided) (duration: 00m 06s)
16:16 bearloga@deploy1003: Started deploy [airflow-dags/wmde@c55c7de]: (no justification provided)
16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P67737 and previous config saved to /var/cache/conftool/dbconfig/20240823-161540-ladsgroup.json
16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T371742)', diff saved to https://phabricator.wikimedia.org/P67736 and previous config saved to /var/cache/conftool/dbconfig/20240823-160033-ladsgroup.json
15:59 claime: Running homer 'cr*codfw*' commit T372878
15:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2027.codfw.wmnet
15:59 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2027.codfw.wmnet
15:54 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
15:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 17:00:00 on wdqs[1023-1024].eqiad.wmnet with reason: noisy alerts related to graph split T337013
15:52 bking@cumin2002: START - Cookbook sre.hosts.downtime for 17:00:00 on wdqs[1023-1024].eqiad.wmnet with reason: noisy alerts related to graph split T337013
15:52 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
15:38 claime: Running homer 'lsw1-a6-codfw*' commit T372878
15:35 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
15:35 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
15:33 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bookworm
15:32 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1001.eqiad.wmnet with OS bookworm
15:29 jgleeson: updated civicrm from 975fc66e to e629834c
15:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T367856)', diff saved to https://phabricator.wikimedia.org/P67735 and previous config saved to /var/cache/conftool/dbconfig/20240823-151730-marostegui.json
15:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 14:00:00 on db2186.codfw.wmnet with reason: Maintenance
15:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 14:00:00 on db2186.codfw.wmnet with reason: Maintenance
15:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 7:00:00 on db2164.codfw.wmnet with reason: Maintenance
15:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 7:00:00 on db2164.codfw.wmnet with reason: Maintenance
15:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T367856)', diff saved to https://phabricator.wikimedia.org/P67734 and previous config saved to /var/cache/conftool/dbconfig/20240823-151704-marostegui.json
15:11 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bookworm
15:09 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1002.eqiad.wmnet with OS bookworm
15:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P67733 and previous config saved to /var/cache/conftool/dbconfig/20240823-150156-marostegui.json
14:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P67732 and previous config saved to /var/cache/conftool/dbconfig/20240823-144649-marostegui.json
14:45 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1002.eqiad.wmnet with reason: host reimage
14:42 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1002.eqiad.wmnet with reason: host reimage
14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T370903)', diff saved to https://phabricator.wikimedia.org/P67731 and previous config saved to /var/cache/conftool/dbconfig/20240823-143952-ladsgroup.json
14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2027.codfw.wmnet with OS bullseye
14:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T367856)', diff saved to https://phabricator.wikimedia.org/P67730 and previous config saved to /var/cache/conftool/dbconfig/20240823-143140-marostegui.json
14:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P67729 and previous config saved to /var/cache/conftool/dbconfig/20240823-142445-ladsgroup.json
14:22 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bookworm
14:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T371742)', diff saved to https://phabricator.wikimedia.org/P67728 and previous config saved to /var/cache/conftool/dbconfig/20240823-141841-ladsgroup.json
14:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2007.codfw.wmnet
14:18 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2007.codfw.wmnet
14:18 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance
14:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance
14:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T371742)', diff saved to https://phabricator.wikimedia.org/P67727 and previous config saved to /var/cache/conftool/dbconfig/20240823-141819-ladsgroup.json
14:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P67725 and previous config saved to /var/cache/conftool/dbconfig/20240823-140312-ladsgroup.json
14:01 claime: Running homer 'cr*codfw*' commit 'T372878'
13:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f3c3b32f220>
13:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2027
13:57 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2007.codfw.wmnet with reason: host reimage
13:55 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2027
13:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2027.codfw.wmnet 176.0.192.10.in-addr.arpa 6.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
13:55 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2027.codfw.wmnet 176.0.192.10.in-addr.arpa 6.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
13:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2027 - cgoubert@cumin1002"
13:55 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2027 - cgoubert@cumin1002"
13:54 stran@deploy1003: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
13:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T370903)', diff saved to https://phabricator.wikimedia.org/P67724 and previous config saved to /var/cache/conftool/dbconfig/20240823-135431-ladsgroup.json
13:54 milimetric@deploy1003: Finished deploy [analytics/refinery@e5d0d48] (thin): Special deploy to make sure sqoop logic matches schema change (duration: 04m 48s)
13:54 stran@deploy1003: helmfile [codfw] START helmfile.d/services/ipoid: apply
13:53 stran@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
13:53 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2007.codfw.wmnet with reason: host reimage
13:52 stran@deploy1003: helmfile [eqiad] START helmfile.d/services/ipoid: apply
13:52 stran@deploy1003: helmfile [staging] DONE helmfile.d/services/ipoid: apply
13:52 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
13:51 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f3c3b32f220>
13:51 stran@deploy1003: helmfile [staging] START helmfile.d/services/ipoid: apply
13:51 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2027.codfw.wmnet with OS bullseye
13:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2027.codfw.wmnet
13:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2027.codfw.wmnet
13:49 milimetric@deploy1003: Started deploy [analytics/refinery@e5d0d48] (thin): Special deploy to make sure sqoop logic matches schema change
13:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P67723 and previous config saved to /var/cache/conftool/dbconfig/20240823-134805-ladsgroup.json
13:42 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:37 milimetric@deploy1003: Finished deploy [analytics/refinery@e5d0d48]: Special deploy to make sure sqoop logic matches schema change (duration: 07m 22s)
13:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7ffa0cd98d60>
13:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2007
13:35 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2007
13:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2007.codfw.wmnet 195.16.192.10.in-addr.arpa 5.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
13:34 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2007.codfw.wmnet 195.16.192.10.in-addr.arpa 5.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
13:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2007 - cgoubert@cumin1002"
13:34 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2007 - cgoubert@cumin1002"
13:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T371742)', diff saved to https://phabricator.wikimedia.org/P67722 and previous config saved to /var/cache/conftool/dbconfig/20240823-133258-ladsgroup.json
13:32 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on kafka-main2001.codfw.wmnet with reason: Decom next week
13:32 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on kafka-main2001.codfw.wmnet with reason: Decom next week
13:31 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
13:31 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7ffa0cd98d60>
13:30 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2007.codfw.wmnet with OS bullseye
13:30 milimetric@deploy1003: Started deploy [analytics/refinery@e5d0d48]: Special deploy to make sure sqoop logic matches schema change
13:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2007.codfw.wmnet
13:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T370903)', diff saved to https://phabricator.wikimedia.org/P67721 and previous config saved to /var/cache/conftool/dbconfig/20240823-132838-ladsgroup.json
13:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2219.codfw.wmnet with reason: Maintenance
13:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2219.codfw.wmnet with reason: Maintenance
13:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T370903)', diff saved to https://phabricator.wikimedia.org/P67720 and previous config saved to /var/cache/conftool/dbconfig/20240823-132804-ladsgroup.json
13:25 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2007.codfw.wmnet
13:21 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1003.eqiad.wmnet with OS bookworm
13:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2033.codfw.wmnet
13:18 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2033.codfw.wmnet
13:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host wikikube-worker2033.codfw.wmnet
13:17 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-worker2033.codfw.wmnet
13:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P67719 and previous config saved to /var/cache/conftool/dbconfig/20240823-131257-ladsgroup.json
13:10 claime: Running homer 'cr*codfw*' commit 'T372878'
13:09 milimetric@deploy1003: Finished deploy [analytics/refinery@e5d0d48]: Special deploy to make sure sqoop logic matches schema change (duration: 01m 57s)
13:09 claime: Running homer 'lsw1-a3-codfw*' commit 'T372878'
13:07 milimetric@deploy1003: Started deploy [analytics/refinery@e5d0d48]: Special deploy to make sure sqoop logic matches schema change
12:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P67718 and previous config saved to /var/cache/conftool/dbconfig/20240823-125750-ladsgroup.json
12:57 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1003.eqiad.wmnet with reason: host reimage
12:54 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1003.eqiad.wmnet with reason: host reimage
12:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T370903)', diff saved to https://phabricator.wikimedia.org/P67717 and previous config saved to /var/cache/conftool/dbconfig/20240823-124243-ladsgroup.json
12:39 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:39 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:34 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1003.eqiad.wmnet with OS bookworm
12:32 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
12:31 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
12:31 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1004.eqiad.wmnet with OS bookworm
12:20 arnaudb@cumin1002: END (ERROR) - Cookbook sre.switchdc.databases.prepare (exit_code=97) for the (test) switch
12:20 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
12:17 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the (test) switch
12:17 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
12:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T370903)', diff saved to https://phabricator.wikimedia.org/P67716 and previous config saved to /var/cache/conftool/dbconfig/20240823-121653-ladsgroup.json
12:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2210.codfw.wmnet with reason: Maintenance
12:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2210.codfw.wmnet with reason: Maintenance
12:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T370903)', diff saved to https://phabricator.wikimedia.org/P67715 and previous config saved to /var/cache/conftool/dbconfig/20240823-121631-ladsgroup.json
12:16 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the (test) switch
12:16 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch
12:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2033.codfw.wmnet with OS bullseye
12:08 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
12:04 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
12:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P67714 and previous config saved to /var/cache/conftool/dbconfig/20240823-120124-ladsgroup.json
11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2033.codfw.wmnet with reason: host reimage
11:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T371742)', diff saved to https://phabricator.wikimedia.org/P67713 and previous config saved to /var/cache/conftool/dbconfig/20240823-115358-ladsgroup.json
11:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance
11:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance
11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T371742)', diff saved to https://phabricator.wikimedia.org/P67712 and previous config saved to /var/cache/conftool/dbconfig/20240823-115336-ladsgroup.json
11:52 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2033.codfw.wmnet with reason: host reimage
11:48 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the (test) switch from test-s1 to test-s1
11:48 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch from test-s1 to test-s1
11:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P67711 and previous config saved to /var/cache/conftool/dbconfig/20240823-114616-ladsgroup.json
11:44 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1004.eqiad.wmnet with OS bookworm
11:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P67710 and previous config saved to /var/cache/conftool/dbconfig/20240823-113829-ladsgroup.json
11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7fa9a1bb7d00>
11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2033
11:35 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2033
11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2033.codfw.wmnet 55.0.192.10.in-addr.arpa 5.5.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
11:35 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2033.codfw.wmnet 55.0.192.10.in-addr.arpa 5.5.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2033 - cgoubert@cumin1002"
11:35 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2033 - cgoubert@cumin1002"
11:32 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
11:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2003.codfw.wmnet
11:32 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2003.codfw.wmnet
11:31 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7fa9a1bb7d00>
11:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T370903)', diff saved to https://phabricator.wikimedia.org/P67709 and previous config saved to /var/cache/conftool/dbconfig/20240823-113109-ladsgroup.json
11:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2033.codfw.wmnet with OS bullseye
11:30 claime: Running homer 'lsw1-b3-codfw*' commit 'T372878'
11:28 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1004.eqiad.wmnet with OS bookworm
11:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2003.codfw.wmnet with OS bullseye
11:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2033.codfw.wmnet
11:28 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2033.codfw.wmnet
11:27 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the (test) switch from test-s1 to test-s1
11:27 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch from test-s1 to test-s1
11:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2026.codfw.wmnet
11:27 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2026.codfw.wmnet
11:23 claime: Running homer 'lsw1-a3-codfw*' commit 'T372878'
11:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P67708 and previous config saved to /var/cache/conftool/dbconfig/20240823-112320-ladsgroup.json
11:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2026.codfw.wmnet with OS bullseye
11:16 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1004.eqiad.wmnet with OS bookworm
11:16 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1004.eqiad.wmnet with OS bookworm
11:16 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the (test) switch from test-s1 to test-s1
11:16 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the (test) switch from test-s1 to test-s1
11:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2003.codfw.wmnet with reason: host reimage
11:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T371742)', diff saved to https://phabricator.wikimedia.org/P67707 and previous config saved to /var/cache/conftool/dbconfig/20240823-110813-ladsgroup.json
11:07 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
11:07 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1004.eqiad.wmnet with OS bookworm
11:05 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2003.codfw.wmnet with reason: host reimage
11:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1005.eqiad.wmnet with OS bookworm
11:05 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1002"
11:03 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1002"
11:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
11:01 claime: running homer 'cr*codfw*' commit T372878
10:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T370903)', diff saved to https://phabricator.wikimedia.org/P67706 and previous config saved to /var/cache/conftool/dbconfig/20240823-105938-ladsgroup.json
10:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2206.codfw.wmnet with reason: Maintenance
10:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2206.codfw.wmnet with reason: Maintenance
10:58 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
10:56 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
10:55 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
10:55 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
10:55 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
10:54 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
10:54 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
10:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2012.codfw.wmnet
10:53 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2012.codfw.wmnet
10:53 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
10:53 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
10:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f0f758a2d30>
10:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2003
10:47 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2003
10:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2003.codfw.wmnet 177.16.192.10.in-addr.arpa 7.7.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
10:46 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2003.codfw.wmnet 177.16.192.10.in-addr.arpa 7.7.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
10:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2003 - cgoubert@cumin1002"
10:46 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2003 - cgoubert@cumin1002"
10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2012.codfw.wmnet with OS bullseye
10:42 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:42 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f0f758a2d30>
10:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2003.codfw.wmnet with OS bullseye
10:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f0fc17f5d00>
10:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2026
10:41 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2026
10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2026.codfw.wmnet 170.0.192.10.in-addr.arpa 0.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
10:40 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2026.codfw.wmnet 170.0.192.10.in-addr.arpa 0.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2026 - cgoubert@cumin1002"
10:40 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2026 - cgoubert@cumin1002"
10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2003.codfw.wmnet
10:40 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
10:39 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2003.codfw.wmnet
10:37 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
10:36 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:36 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f0fc17f5d00>
10:36 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
10:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2026.codfw.wmnet
10:35 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2026.codfw.wmnet
10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2025.codfw.wmnet
10:34 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2025.codfw.wmnet
10:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2025.codfw.wmnet with OS bullseye
10:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2199.codfw.wmnet with reason: Maintenance
10:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2199.codfw.wmnet with reason: Maintenance
10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T370903)', diff saved to https://phabricator.wikimedia.org/P67705 and previous config saved to /var/cache/conftool/dbconfig/20240823-103006-ladsgroup.json
10:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2012.codfw.wmnet with reason: host reimage
10:22 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2012.codfw.wmnet with reason: host reimage
10:19 btullis@cumin1002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:datahubsearch
10:17 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P67704 and previous config saved to /var/cache/conftool/dbconfig/20240823-101459-ladsgroup.json
10:14 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1005.eqiad.wmnet with OS bookworm
10:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2025.codfw.wmnet with reason: host reimage
10:09 btullis@cumin1002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch
10:07 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2025.codfw.wmnet with reason: host reimage
10:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f00474004c0>
10:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2012
10:06 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2012
10:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2012.codfw.wmnet 67.0.192.10.in-addr.arpa 7.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
10:06 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2012.codfw.wmnet 67.0.192.10.in-addr.arpa 7.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
10:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2012 - cgoubert@cumin1002"
10:06 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2012 - cgoubert@cumin1002"
10:04 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
10:04 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1005.eqiad.wmnet with OS bookworm
10:01 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:01 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f00474004c0>
10:00 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2012.codfw.wmnet with OS bullseye
10:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2012.codfw.wmnet
09:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P67702 and previous config saved to /var/cache/conftool/dbconfig/20240823-095952-ladsgroup.json
09:59 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2012.codfw.wmnet
09:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f4f1d59cdf0>
09:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2025
09:50 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2025
09:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2025.codfw.wmnet 168.0.192.10.in-addr.arpa 8.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
09:49 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2025.codfw.wmnet 168.0.192.10.in-addr.arpa 8.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
09:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2025 - cgoubert@cumin1002"
09:49 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2025 - cgoubert@cumin1002"
09:49 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
09:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T370903)', diff saved to https://phabricator.wikimedia.org/P67701 and previous config saved to /var/cache/conftool/dbconfig/20240823-094445-ladsgroup.json
09:42 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1005.eqiad.wmnet with OS bookworm
09:39 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
09:39 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f4f1d59cdf0>
09:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2025.codfw.wmnet with OS bullseye
09:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2025.codfw.wmnet
09:38 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2025.codfw.wmnet
09:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T371742)', diff saved to https://phabricator.wikimedia.org/P67700 and previous config saved to /var/cache/conftool/dbconfig/20240823-093050-ladsgroup.json
09:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance
09:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance
09:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T371742)', diff saved to https://phabricator.wikimedia.org/P67699 and previous config saved to /var/cache/conftool/dbconfig/20240823-093028-ladsgroup.json
09:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P67698 and previous config saved to /var/cache/conftool/dbconfig/20240823-091521-ladsgroup.json
09:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2179 (T370903)', diff saved to https://phabricator.wikimedia.org/P67697 and previous config saved to /var/cache/conftool/dbconfig/20240823-091251-ladsgroup.json
09:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2179.codfw.wmnet with reason: Maintenance
09:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2179.codfw.wmnet with reason: Maintenance
09:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T370903)', diff saved to https://phabricator.wikimedia.org/P67696 and previous config saved to /var/cache/conftool/dbconfig/20240823-091229-ladsgroup.json
09:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P67695 and previous config saved to /var/cache/conftool/dbconfig/20240823-090014-ladsgroup.json
08:59 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
08:59 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1005.eqiad.wmnet with OS bookworm
08:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P67694 and previous config saved to /var/cache/conftool/dbconfig/20240823-085722-ladsgroup.json
08:54 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
08:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T371742)', diff saved to https://phabricator.wikimedia.org/P67693 and previous config saved to /var/cache/conftool/dbconfig/20240823-084506-ladsgroup.json
08:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P67692 and previous config saved to /var/cache/conftool/dbconfig/20240823-084214-ladsgroup.json
08:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T370903)', diff saved to https://phabricator.wikimedia.org/P67691 and previous config saved to /var/cache/conftool/dbconfig/20240823-082707-ladsgroup.json
08:17 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
08:08 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
07:58 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
07:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T370903)', diff saved to https://phabricator.wikimedia.org/P67690 and previous config saved to /var/cache/conftool/dbconfig/20240823-075415-ladsgroup.json
07:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2172.codfw.wmnet with reason: Maintenance
07:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2172.codfw.wmnet with reason: Maintenance
07:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T370903)', diff saved to https://phabricator.wikimedia.org/P67689 and previous config saved to /var/cache/conftool/dbconfig/20240823-075353-ladsgroup.json
07:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P67688 and previous config saved to /var/cache/conftool/dbconfig/20240823-073846-ladsgroup.json
07:27 godog: start prometheus1006 bookworm upgrade - T326657
07:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P67687 and previous config saved to /var/cache/conftool/dbconfig/20240823-072339-ladsgroup.json
07:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T370903)', diff saved to https://phabricator.wikimedia.org/P67686 and previous config saved to /var/cache/conftool/dbconfig/20240823-070832-ladsgroup.json
06:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T371742)', diff saved to https://phabricator.wikimedia.org/P67685 and previous config saved to /var/cache/conftool/dbconfig/20240823-065819-ladsgroup.json
06:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
06:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
06:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T371742)', diff saved to https://phabricator.wikimedia.org/P67684 and previous config saved to /var/cache/conftool/dbconfig/20240823-065756-ladsgroup.json
06:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P67683 and previous config saved to /var/cache/conftool/dbconfig/20240823-064249-ladsgroup.json
06:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T370903)', diff saved to https://phabricator.wikimedia.org/P67682 and previous config saved to /var/cache/conftool/dbconfig/20240823-063539-ladsgroup.json
06:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
06:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
06:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2155.codfw.wmnet with reason: Maintenance
06:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2155.codfw.wmnet with reason: Maintenance
06:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T370903)', diff saved to https://phabricator.wikimedia.org/P67681 and previous config saved to /var/cache/conftool/dbconfig/20240823-063502-ladsgroup.json
06:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P67680 and previous config saved to /var/cache/conftool/dbconfig/20240823-062742-ladsgroup.json
06:20 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set cephosd1005to failed in Netbox - ayounsi@cumin1002"
06:19 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set cephosd1005to failed in Netbox - ayounsi@cumin1002"
06:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P67679 and previous config saved to /var/cache/conftool/dbconfig/20240823-061954-ladsgroup.json
06:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T371742)', diff saved to https://phabricator.wikimedia.org/P67678 and previous config saved to /var/cache/conftool/dbconfig/20240823-061235-ladsgroup.json
06:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P67677 and previous config saved to /var/cache/conftool/dbconfig/20240823-060447-ladsgroup.json
05:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T370903)', diff saved to https://phabricator.wikimedia.org/P67676 and previous config saved to /var/cache/conftool/dbconfig/20240823-054940-ladsgroup.json
05:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T370903)', diff saved to https://phabricator.wikimedia.org/P67675 and previous config saved to /var/cache/conftool/dbconfig/20240823-051718-ladsgroup.json
05:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2147.codfw.wmnet with reason: Maintenance
05:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2147.codfw.wmnet with reason: Maintenance
04:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
04:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
04:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T370903)', diff saved to https://phabricator.wikimedia.org/P67674 and previous config saved to /var/cache/conftool/dbconfig/20240823-044132-ladsgroup.json
04:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P67673 and previous config saved to /var/cache/conftool/dbconfig/20240823-042625-ladsgroup.json
04:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T371742)', diff saved to https://phabricator.wikimedia.org/P67672 and previous config saved to /var/cache/conftool/dbconfig/20240823-042531-ladsgroup.json
04:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
04:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
04:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
04:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
04:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T371742)', diff saved to https://phabricator.wikimedia.org/P67671 and previous config saved to /var/cache/conftool/dbconfig/20240823-042454-ladsgroup.json
04:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P67670 and previous config saved to /var/cache/conftool/dbconfig/20240823-041118-ladsgroup.json
04:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P67669 and previous config saved to /var/cache/conftool/dbconfig/20240823-040947-ladsgroup.json
03:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T370903)', diff saved to https://phabricator.wikimedia.org/P67668 and previous config saved to /var/cache/conftool/dbconfig/20240823-035611-ladsgroup.json
03:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P67667 and previous config saved to /var/cache/conftool/dbconfig/20240823-035439-ladsgroup.json
03:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T371742)', diff saved to https://phabricator.wikimedia.org/P67666 and previous config saved to /var/cache/conftool/dbconfig/20240823-033932-ladsgroup.json
03:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2137 (T370903)', diff saved to https://phabricator.wikimedia.org/P67665 and previous config saved to /var/cache/conftool/dbconfig/20240823-032642-ladsgroup.json
03:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2137.codfw.wmnet with reason: Maintenance
03:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2137.codfw.wmnet with reason: Maintenance
03:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T370903)', diff saved to https://phabricator.wikimedia.org/P67664 and previous config saved to /var/cache/conftool/dbconfig/20240823-032620-ladsgroup.json
03:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P67663 and previous config saved to /var/cache/conftool/dbconfig/20240823-031113-ladsgroup.json
02:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P67662 and previous config saved to /var/cache/conftool/dbconfig/20240823-025605-ladsgroup.json
02:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T370903)', diff saved to https://phabricator.wikimedia.org/P67661 and previous config saved to /var/cache/conftool/dbconfig/20240823-024058-ladsgroup.json
02:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2136 (T370903)', diff saved to https://phabricator.wikimedia.org/P67660 and previous config saved to /var/cache/conftool/dbconfig/20240823-021231-ladsgroup.json
02:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2136.codfw.wmnet with reason: Maintenance
02:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2136.codfw.wmnet with reason: Maintenance
01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T371742)', diff saved to https://phabricator.wikimedia.org/P67659 and previous config saved to /var/cache/conftool/dbconfig/20240823-015417-ladsgroup.json
01:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
01:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
01:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
01:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
01:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T370903)', diff saved to https://phabricator.wikimedia.org/P67658 and previous config saved to /var/cache/conftool/dbconfig/20240823-014706-ladsgroup.json
01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P67657 and previous config saved to /var/cache/conftool/dbconfig/20240823-013158-ladsgroup.json
01:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P67656 and previous config saved to /var/cache/conftool/dbconfig/20240823-011651-ladsgroup.json
01:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T370903)', diff saved to https://phabricator.wikimedia.org/P67655 and previous config saved to /var/cache/conftool/dbconfig/20240823-010144-ladsgroup.json
00:49 krinkle@deploy1003: Finished deploy [integration/docroot@da4dac4]: (no justification provided) (duration: 00m 06s)
00:49 krinkle@deploy1003: Started deploy [integration/docroot@da4dac4]: (no justification provided)
00:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T370903)', diff saved to https://phabricator.wikimedia.org/P67653 and previous config saved to /var/cache/conftool/dbconfig/20240823-003815-ladsgroup.json
00:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1249.eqiad.wmnet with reason: Maintenance
00:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1249.eqiad.wmnet with reason: Maintenance
00:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T370903)', diff saved to https://phabricator.wikimedia.org/P67652 and previous config saved to /var/cache/conftool/dbconfig/20240823-003753-ladsgroup.json
00:28 andrewbogott: rebooting puppetserver1003.eqiad.wmnet from mgmt console; It's unresponsive and causing puppet errors on clients.
00:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P67651 and previous config saved to /var/cache/conftool/dbconfig/20240823-002245-ladsgroup.json
00:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
00:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
00:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T371742)', diff saved to https://phabricator.wikimedia.org/P67650 and previous config saved to /var/cache/conftool/dbconfig/20240823-001219-ladsgroup.json
00:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P67649 and previous config saved to /var/cache/conftool/dbconfig/20240823-000738-ladsgroup.json

2024-08-22

23:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P67648 and previous config saved to /var/cache/conftool/dbconfig/20240822-235711-ladsgroup.json
23:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T370903)', diff saved to https://phabricator.wikimedia.org/P67647 and previous config saved to /var/cache/conftool/dbconfig/20240822-235231-ladsgroup.json
23:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P67646 and previous config saved to /var/cache/conftool/dbconfig/20240822-234203-ladsgroup.json
23:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T371742)', diff saved to https://phabricator.wikimedia.org/P67645 and previous config saved to /var/cache/conftool/dbconfig/20240822-232656-ladsgroup.json
22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T370903)', diff saved to https://phabricator.wikimedia.org/P67644 and previous config saved to /var/cache/conftool/dbconfig/20240822-224921-ladsgroup.json
22:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1248.eqiad.wmnet with reason: Maintenance
22:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1248.eqiad.wmnet with reason: Maintenance
22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T370903)', diff saved to https://phabricator.wikimedia.org/P67643 and previous config saved to /var/cache/conftool/dbconfig/20240822-224859-ladsgroup.json
22:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P67642 and previous config saved to /var/cache/conftool/dbconfig/20240822-223351-ladsgroup.json
22:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P67641 and previous config saved to /var/cache/conftool/dbconfig/20240822-221844-ladsgroup.json
22:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T370903)', diff saved to https://phabricator.wikimedia.org/P67640 and previous config saved to /var/cache/conftool/dbconfig/20240822-220337-ladsgroup.json
21:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2127 (T371742)', diff saved to https://phabricator.wikimedia.org/P67639 and previous config saved to /var/cache/conftool/dbconfig/20240822-213909-ladsgroup.json
21:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
21:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T370903)', diff saved to https://phabricator.wikimedia.org/P67638 and previous config saved to /var/cache/conftool/dbconfig/20240822-213406-ladsgroup.json
21:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1247.eqiad.wmnet with reason: Maintenance
21:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1247.eqiad.wmnet with reason: Maintenance
21:32 brennen@deploy1003: Finished scap sync-world: Backport for Turn on Parsoid read views for cswikivoyage and rowikivoyage (T371353) (duration: 09m 36s)
21:28 brennen@deploy1003: brennen, cscott: Continuing with sync
21:27 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
21:26 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
21:25 brennen@deploy1003: brennen, cscott: Backport for Turn on Parsoid read views for cswikivoyage and rowikivoyage (T371353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:22 brennen@deploy1003: Started scap sync-world: Backport for Turn on Parsoid read views for cswikivoyage and rowikivoyage (T371353)
21:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1245.eqiad.wmnet with reason: Maintenance
21:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1245.eqiad.wmnet with reason: Maintenance
21:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T370903)', diff saved to https://phabricator.wikimedia.org/P67637 and previous config saved to /var/cache/conftool/dbconfig/20240822-210025-ladsgroup.json
20:47 mutante: dzahn@cumin2002 conftool action : set/pooled=no; selector: name=ml-serve2002.codfw.wmnet T365291
20:46 dzahn@cumin2002: conftool action : set/pooled=no; selector: name=ml-serve2002.codfw.wmnet
20:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P67636 and previous config saved to /var/cache/conftool/dbconfig/20240822-204518-ladsgroup.json
20:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P67635 and previous config saved to /var/cache/conftool/dbconfig/20240822-203010-ladsgroup.json
20:26 cdanis@deploy1003: Finished scap sync-world: Backport for Revert "Invert logic on empty talk page" (T373100) (duration: 07m 16s)
20:21 cdanis@deploy1003: matmarex, cdanis: Continuing with sync
20:21 cdanis@deploy1003: matmarex, cdanis: Backport for Revert "Invert logic on empty talk page" (T373100) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:19 cdanis@deploy1003: Started scap sync-world: Backport for Revert "Invert logic on empty talk page" (T373100)
20:19 swfrench-wmf: imported wikidiff2_1.14.1-2+wmf11u2 into component/php81 - T372507
20:18 swfrench-wmf: imported php-wmerrors_2.0.0-1+wmf11u2 into component/php81 - T372507
20:17 swfrench-wmf: imported php-luasandbox_4.1.2-1+wmf11u2 into component/php81 - T372507
20:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T370903)', diff saved to https://phabricator.wikimedia.org/P67634 and previous config saved to /var/cache/conftool/dbconfig/20240822-201503-ladsgroup.json
20:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
20:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
19:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T370903)', diff saved to https://phabricator.wikimedia.org/P67633 and previous config saved to /var/cache/conftool/dbconfig/20240822-194830-ladsgroup.json
19:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1244.eqiad.wmnet with reason: Maintenance
19:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1244.eqiad.wmnet with reason: Maintenance
19:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P67632 and previous config saved to /var/cache/conftool/dbconfig/20240822-194808-ladsgroup.json
19:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P67631 and previous config saved to /var/cache/conftool/dbconfig/20240822-193301-ladsgroup.json
19:31 ryankemper: T364368 Pooled wdqs2024 (its data transfer has completed successfully)
19:30 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: cluster=wdqs-scholarly
19:18 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs2023.codfw.wmnet -> wdqs2024.codfw.wmnet w/ force delete existing files, repooling neither afterwards
19:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P67630 and previous config saved to /var/cache/conftool/dbconfig/20240822-191754-ladsgroup.json
19:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P67629 and previous config saved to /var/cache/conftool/dbconfig/20240822-190247-ladsgroup.json
19:01 ryankemper: T364368 Pooled all wdqs main/scholarly hosts except wdqs2024, which won't be ready for another hour
19:01 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: cluster=wdqs-scholarly
18:57 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: cluster=wdqs-main
18:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1240.eqiad.wmnet with reason: Maintenance
18:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1240.eqiad.wmnet with reason: Maintenance
18:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T371742)', diff saved to https://phabricator.wikimedia.org/P67628 and previous config saved to /var/cache/conftool/dbconfig/20240822-184628-ladsgroup.json
18:36 sukhe@cumin1002: END (ERROR) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=97) Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not P{cp4044* or cp4052*} and A:cp for 9.2.5-1wm2
18:36 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs2023.codfw.wmnet -> wdqs2024.codfw.wmnet w/ force delete existing files, repooling neither afterwards
18:36 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not P{cp4044* or cp4052*} and A:cp for 9.2.5-1wm2
18:36 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not P{cp40[37-40]* or cp4044* or cp4052*} and A:cp for 9.2.5-1wm2
18:35 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 11s)
18:35 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
18:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P67627 and previous config saved to /var/cache/conftool/dbconfig/20240822-183120-ladsgroup.json
18:19 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on wdqs2024.codfw.wmnet with reason: needs a data transfer
18:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2024.codfw.wmnet with reason: needs a data transfer
18:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P67626 and previous config saved to /var/cache/conftool/dbconfig/20240822-181613-ladsgroup.json
18:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P67625 and previous config saved to /var/cache/conftool/dbconfig/20240822-180230-ladsgroup.json
18:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1243.eqiad.wmnet with reason: Maintenance
18:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1243.eqiad.wmnet with reason: Maintenance
18:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T370903)', diff saved to https://phabricator.wikimedia.org/P67624 and previous config saved to /var/cache/conftool/dbconfig/20240822-180208-ladsgroup.json
18:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T371742)', diff saved to https://phabricator.wikimedia.org/P67623 and previous config saved to /var/cache/conftool/dbconfig/20240822-180106-ladsgroup.json
17:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P67622 and previous config saved to /var/cache/conftool/dbconfig/20240822-174701-ladsgroup.json
17:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4041.ulsfo.wmnet [reason: cookbook had failed as Puppet was disabled so pooling manually]
17:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P67621 and previous config saved to /var/cache/conftool/dbconfig/20240822-173153-ladsgroup.json
17:24 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not P{cp40[37-40]* or cp4044* or cp4052*} and A:cp for 9.2.5-1wm2
17:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1223 (T371742)', diff saved to https://phabricator.wikimedia.org/P67620 and previous config saved to /var/cache/conftool/dbconfig/20240822-172404-ladsgroup.json
17:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
17:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
17:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T371742)', diff saved to https://phabricator.wikimedia.org/P67619 and previous config saved to /var/cache/conftool/dbconfig/20240822-172342-ladsgroup.json
17:20 sukhe: sudo cumin -b11 "A:cp" "run-puppet-agent" rolling out CR 1064797: T370294
17:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T370903)', diff saved to https://phabricator.wikimedia.org/P67618 and previous config saved to /var/cache/conftool/dbconfig/20240822-171646-ladsgroup.json
17:11 cdanis: 💙cdanis@cumin1002.eqiad.wmnet ~ 🕐☕ sudo ipmitool -I lanplus -H "puppetserver1002.mgmt.eqiad.wmnet" -U root -E chassis power cycle
17:10 topranks: removing no-longer-required vlans from ssw1-a1-codfw after lvs move T370927
17:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P67617 and previous config saved to /var/cache/conftool/dbconfig/20240822-170835-ladsgroup.json
16:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P67616 and previous config saved to /var/cache/conftool/dbconfig/20240822-165328-ladsgroup.json
16:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T370903)', diff saved to https://phabricator.wikimedia.org/P67615 and previous config saved to /var/cache/conftool/dbconfig/20240822-164505-ladsgroup.json
16:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1242.eqiad.wmnet with reason: Maintenance
16:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1242.eqiad.wmnet with reason: Maintenance
16:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T370903)', diff saved to https://phabricator.wikimedia.org/P67614 and previous config saved to /var/cache/conftool/dbconfig/20240822-164443-ladsgroup.json
16:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T371742)', diff saved to https://phabricator.wikimedia.org/P67613 and previous config saved to /var/cache/conftool/dbconfig/20240822-163819-ladsgroup.json
16:35 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2013.codfw.wmnet with OS bullseye
16:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P67612 and previous config saved to /var/cache/conftool/dbconfig/20240822-162936-ladsgroup.json
16:27 ChrisDobbins901_: cdobbins@cumin1002:~$ sudo cumin -b11 'A:cp' 'run-puppet-agent --enable "merging CR 1064782"'
16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P67611 and previous config saved to /var/cache/conftool/dbconfig/20240822-161429-ladsgroup.json
16:11 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: host reimage
16:09 sukhe@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=1) Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not P{cp4044* or cp4052*} and A:cp for 9.2.5-1wm2
16:07 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: host reimage
16:05 ChrisDobbins901_: cdobbins@cumin1002:~$ sudo cumin 'A:cp' 'disable-puppet' 'merging CR 1064782'
16:03 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
16:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
16:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T371742)', diff saved to https://phabricator.wikimedia.org/P67610 and previous config saved to /var/cache/conftool/dbconfig/20240822-160131-ladsgroup.json
16:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
16:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
16:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
16:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T371742)', diff saved to https://phabricator.wikimedia.org/P67609 and previous config saved to /var/cache/conftool/dbconfig/20240822-160052-ladsgroup.json
15:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T370903)', diff saved to https://phabricator.wikimedia.org/P67608 and previous config saved to /var/cache/conftool/dbconfig/20240822-155921-ladsgroup.json
15:50 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host lvs2013.codfw.wmnet with OS bullseye
15:48 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1005.eqiad.wmnet with OS bookworm
15:46 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lvs2013.codfw.wmnet on all recursors
15:46 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache lvs2013.codfw.wmnet on all recursors
15:46 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lvs2014.codfw.wmnet on all recursors
15:46 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache lvs2014.codfw.wmnet on all recursors
15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P67607 and previous config saved to /var/cache/conftool/dbconfig/20240822-154544-ladsgroup.json
15:45 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:45 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for lvs2013 - cmooney@cumin1002"
15:45 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for lvs2013 - cmooney@cumin1002"
15:41 cmooney@cumin1002: START - Cookbook sre.dns.netbox
15:37 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not P{cp4044* or cp4052*} and A:cp for 9.2.5-1wm2
15:36 topranks: add vlans to trunk port on lsw1-c2-codfw facing new lvs2013 link T370927
15:36 sukhe: upgrading A:cp-ulsfo to ATS 9.2.5: T339134
15:31 topranks: disabling BGP on cr1-codfw and cr2-codfw towards lvs2013 in advance of host move to new switch T370927
15:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P67606 and previous config saved to /var/cache/conftool/dbconfig/20240822-153037-ladsgroup.json
15:30 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: move lvs2013 from asw to lsw
15:30 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: move lvs2013 from asw to lsw
15:29 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lsw1-c2-codfw.mgmt with reason: move lvs2013 from asw to lsw
15:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lsw1-c2-codfw.mgmt with reason: move lvs2013 from asw to lsw
15:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T370903)', diff saved to https://phabricator.wikimedia.org/P67605 and previous config saved to /var/cache/conftool/dbconfig/20240822-152620-ladsgroup.json
15:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1241.eqiad.wmnet with reason: Maintenance
15:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1241.eqiad.wmnet with reason: Maintenance
15:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T370903)', diff saved to https://phabricator.wikimedia.org/P67604 and previous config saved to /var/cache/conftool/dbconfig/20240822-152558-ladsgroup.json
15:22 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kafka-main2001.codfw.wmnet with reason: Hardware refresh
15:22 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on kafka-main2001.codfw.wmnet with reason: Hardware refresh
15:21 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kafka-main2006.codfw.wmnet
15:21 jayme@cumin1002: START - Cookbook sre.hosts.remove-downtime for kafka-main2006.codfw.wmnet
15:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T371742)', diff saved to https://phabricator.wikimedia.org/P67603 and previous config saved to /var/cache/conftool/dbconfig/20240822-151530-ladsgroup.json
15:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P67602 and previous config saved to /var/cache/conftool/dbconfig/20240822-151050-ladsgroup.json
15:01 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
14:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P67601 and previous config saved to /var/cache/conftool/dbconfig/20240822-145543-ladsgroup.json
14:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
14:47 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
14:46 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
14:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
14:40 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T370903)', diff saved to https://phabricator.wikimedia.org/P67600 and previous config saved to /var/cache/conftool/dbconfig/20240822-144036-ladsgroup.json
14:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T371742)', diff saved to https://phabricator.wikimedia.org/P67599 and previous config saved to /var/cache/conftool/dbconfig/20240822-143655-ladsgroup.json
14:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
14:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
14:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T371742)', diff saved to https://phabricator.wikimedia.org/P67598 and previous config saved to /var/cache/conftool/dbconfig/20240822-143633-ladsgroup.json
14:36 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
14:32 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
14:31 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
14:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
14:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P67597 and previous config saved to /var/cache/conftool/dbconfig/20240822-142126-ladsgroup.json
14:19 MichaelG_WMF: T372333, with I431d2a checked out, running mwscript /home/migr/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=dewiki --dry-run --search-index --db-table
13:59 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
13:58 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
13:58 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
13:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P67593 and previous config saved to /var/cache/conftool/dbconfig/20240822-135731-ladsgroup.json
13:57 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
13:57 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:55 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
13:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1296.eqiad.wmnet with OS bullseye
13:54 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
13:53 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams: apply
13:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T371742)', diff saved to https://phabricator.wikimedia.org/P67592 and previous config saved to /var/cache/conftool/dbconfig/20240822-135111-ladsgroup.json
13:50 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:48 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
13:46 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
13:45 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
13:45 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
13:44 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
13:43 TheresNoTime: UTC afternoon backport window closed
13:42 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
13:42 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
13:42 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
13:42 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
13:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P67591 and previous config saved to /var/cache/conftool/dbconfig/20240822-134224-ladsgroup.json
13:41 jayme@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
13:41 jayme@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
13:38 samtar@deploy1003: Finished scap sync-world: Backport for knwikisource : Create flood flag and add file importer right to Admin user group (T373073) (duration: 08m 20s)
13:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1296.eqiad.wmnet with reason: host reimage
13:34 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1296.eqiad.wmnet with reason: host reimage
13:34 samtar@deploy1003: anzx, samtar: Continuing with sync
13:32 samtar@deploy1003: anzx, samtar: Backport for knwikisource : Create flood flag and add file importer right to Admin user group (T373073) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:30 samtar@deploy1003: Started scap sync-world: Backport for knwikisource : Create flood flag and add file importer right to Admin user group (T373073)
13:27 samtar@deploy1003: Finished scap sync-world: Backport for Use shellbox-video for videoscaling on group2 (T356241) (duration: 09m 10s)
13:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T370903)', diff saved to https://phabricator.wikimedia.org/P67590 and previous config saved to /var/cache/conftool/dbconfig/20240822-132717-ladsgroup.json
13:23 samtar@deploy1003: hnowlan, samtar: Continuing with sync
13:23 samtar@deploy1003: hnowlan, samtar: Backport for Use shellbox-video for videoscaling on group2 (T356241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:18 samtar@deploy1003: Started scap sync-world: Backport for Use shellbox-video for videoscaling on group2 (T356241)
13:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1296.eqiad.wmnet with OS bullseye
13:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1296.eqiad.wmnet with OS bullseye
13:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1296.eqiad.wmnet with OS bullseye
13:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1298.eqiad.wmnet with OS bullseye
13:15 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
13:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T371742)', diff saved to https://phabricator.wikimedia.org/P67589 and previous config saved to /var/cache/conftool/dbconfig/20240822-131425-ladsgroup.json
13:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
13:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
13:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T371742)', diff saved to https://phabricator.wikimedia.org/P67588 and previous config saved to /var/cache/conftool/dbconfig/20240822-131402-ladsgroup.json
13:05 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1005.eqiad.wmnet with OS bookworm
12:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P67587 and previous config saved to /var/cache/conftool/dbconfig/20240822-125855-ladsgroup.json
12:55 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
12:54 cdanis@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
12:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P67586 and previous config saved to /var/cache/conftool/dbconfig/20240822-124348-ladsgroup.json
12:37 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
12:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T371742)', diff saved to https://phabricator.wikimedia.org/P67584 and previous config saved to /var/cache/conftool/dbconfig/20240822-122841-ladsgroup.json
12:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2011.codfw.wmnet
12:22 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2011.codfw.wmnet
12:18 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
12:17 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
12:17 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1005.eqiad.wmnet with OS bookworm
12:15 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
12:12 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1005.eqiad.wmnet with OS bookworm
12:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2011.codfw.wmnet with OS bullseye
12:02 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:02 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:02 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
12:02 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
12:02 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:02 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:02 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:02 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
12:02 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
12:02 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
12:02 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
12:02 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
12:01 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
12:01 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
12:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T370903)', diff saved to https://phabricator.wikimedia.org/P67583 and previous config saved to /var/cache/conftool/dbconfig/20240822-120122-ladsgroup.json
12:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
12:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
12:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1221.eqiad.wmnet with reason: Maintenance
12:01 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
12:01 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
12:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1221.eqiad.wmnet with reason: Maintenance
12:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T370903)', diff saved to https://phabricator.wikimedia.org/P67582 and previous config saved to /var/cache/conftool/dbconfig/20240822-120053-ladsgroup.json
12:00 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
12:00 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
12:00 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:53 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
11:52 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
11:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T371742)', diff saved to https://phabricator.wikimedia.org/P67581 and previous config saved to /var/cache/conftool/dbconfig/20240822-115108-ladsgroup.json
11:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
11:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
11:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T371742)', diff saved to https://phabricator.wikimedia.org/P67580 and previous config saved to /var/cache/conftool/dbconfig/20240822-115047-ladsgroup.json
11:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2011.codfw.wmnet with reason: host reimage
11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P67579 and previous config saved to /var/cache/conftool/dbconfig/20240822-114546-ladsgroup.json
11:45 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2011.codfw.wmnet with reason: host reimage
11:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1298.eqiad.wmnet with reason: host reimage
11:42 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1298.eqiad.wmnet with reason: host reimage
11:38 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
11:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P67578 and previous config saved to /var/cache/conftool/dbconfig/20240822-113540-ladsgroup.json
11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P67577 and previous config saved to /var/cache/conftool/dbconfig/20240822-113038-ladsgroup.json
11:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7f32c7881dc0>
11:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2011
11:25 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2011
11:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2011.codfw.wmnet 64.0.192.10.in-addr.arpa 4.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
11:25 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2011.codfw.wmnet 64.0.192.10.in-addr.arpa 4.6.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
11:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2011 - cgoubert@cumin1002"
11:24 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2011 - cgoubert@cumin1002"
11:24 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1298.eqiad.wmnet with OS bullseye
11:22 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1296.eqiad.wmnet with OS bullseye
11:21 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
11:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P67576 and previous config saved to /var/cache/conftool/dbconfig/20240822-112033-ladsgroup.json
11:19 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f32c7881dc0>
11:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2011.codfw.wmnet with OS bullseye
11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T370903)', diff saved to https://phabricator.wikimedia.org/P67575 and previous config saved to /var/cache/conftool/dbconfig/20240822-111531-ladsgroup.json
11:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2011.codfw.wmnet
11:14 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2011.codfw.wmnet
11:12 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1005.eqiad.wmnet with OS bookworm
11:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T371742)', diff saved to https://phabricator.wikimedia.org/P67574 and previous config saved to /var/cache/conftool/dbconfig/20240822-110526-ladsgroup.json
10:49 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
10:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T370903)', diff saved to https://phabricator.wikimedia.org/P67573 and previous config saved to /var/cache/conftool/dbconfig/20240822-104314-ladsgroup.json
10:43 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1199.eqiad.wmnet with reason: Maintenance
10:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1199.eqiad.wmnet with reason: Maintenance
10:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T370903)', diff saved to https://phabricator.wikimedia.org/P67572 and previous config saved to /var/cache/conftool/dbconfig/20240822-104252-ladsgroup.json
10:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P67571 and previous config saved to /var/cache/conftool/dbconfig/20240822-102744-ladsgroup.json
10:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1157 (T371742)', diff saved to https://phabricator.wikimedia.org/P67570 and previous config saved to /var/cache/conftool/dbconfig/20240822-102613-ladsgroup.json
10:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
10:24 cgoubert@deploy1003: Finished scap sync-world: mediawiki: Get rid of obsolete extract2.php redirect - 1064723 - T373048 (duration: 05m 43s)
10:20 cgoubert@deploy1003: cgoubert: Continuing with sync
10:19 cgoubert@deploy1003: cgoubert: mediawiki: Get rid of obsolete extract2.php redirect - 1064723 - T373048 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:18 XioNoX: cr1-eqiad> request vmhost power-on other-routing-engine - T372781
10:18 cgoubert@deploy1003: Started scap sync-world: mediawiki: Get rid of obsolete extract2.php redirect - 1064723 - T373048
10:16 XioNoX: cr1-eqiad> request vmhost power-off other-routing-engine - T372781
10:15 XioNoX: cr1-eqiad> request vmhost snapshot recovery partition re0 - T372781
10:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P67569 and previous config saved to /var/cache/conftool/dbconfig/20240822-101237-ladsgroup.json
10:11 XioNoX: cr1-eqiad> request vmhost snapshot recovery re0 - T372781
10:10 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kafka-main2006.codfw.wmnet with reason: Hardware refresh
10:09 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kafka-main2006.codfw.wmnet with reason: Hardware refresh
09:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T370903)', diff saved to https://phabricator.wikimedia.org/P67568 and previous config saved to /var/cache/conftool/dbconfig/20240822-095730-ladsgroup.json
09:53 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
09:49 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
09:41 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kafka-main2001.codfw.wmnet with reason: Hardware refresh
09:41 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kafka-main2001.codfw.wmnet with reason: Hardware refresh
09:34 godog: start prometheus2006 bookworm upgrade - T326657
09:32 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
09:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T370903)', diff saved to https://phabricator.wikimedia.org/P67567 and previous config saved to /var/cache/conftool/dbconfig/20240822-092631-ladsgroup.json
09:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1190.eqiad.wmnet with reason: Maintenance
09:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1190.eqiad.wmnet with reason: Maintenance
09:24 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
09:20 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
09:13 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:57 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp-test2002.wikimedia.org
08:57 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:57 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
08:57 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
08:54 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
08:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
08:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
08:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
08:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
08:49 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts idp-test2002.wikimedia.org
08:48 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp-test1002.wikimedia.org
08:48 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:48 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
08:47 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
08:44 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
08:39 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts idp-test1002.wikimedia.org
08:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T371742)', diff saved to https://phabricator.wikimedia.org/P67566 and previous config saved to /var/cache/conftool/dbconfig/20240822-083706-ladsgroup.json
08:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P67565 and previous config saved to /var/cache/conftool/dbconfig/20240822-082158-ladsgroup.json
08:16 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.19 refs T366964
08:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P67564 and previous config saved to /var/cache/conftool/dbconfig/20240822-080651-ladsgroup.json
07:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T371742)', diff saved to https://phabricator.wikimedia.org/P67563 and previous config saved to /var/cache/conftool/dbconfig/20240822-075144-ladsgroup.json
07:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T371742)', diff saved to https://phabricator.wikimedia.org/P67562 and previous config saved to /var/cache/conftool/dbconfig/20240822-072836-ladsgroup.json
07:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance
07:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance
07:25 kartik@deploy1003: Finished scap sync-world: Backport for Enable Content/Section translation on WPs without MT (T361582) (duration: 07m 51s)
07:20 eileen: civicrm upgraded from 7dc4401a to 975fc66e
07:20 kartik@deploy1003: kartik: Continuing with sync
07:19 kartik@deploy1003: kartik: Backport for Enable Content/Section translation on WPs without MT (T361582) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:17 kartik@deploy1003: Started scap sync-world: Backport for Enable Content/Section translation on WPs without MT (T361582)
07:11 kartik@deploy1003: Finished scap sync-world: Backport for Content Translation: Revert MT threshold to default for Portuguese Wikipedia (T356356) (duration: 08m 01s)
07:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
07:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
07:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T371742)', diff saved to https://phabricator.wikimedia.org/P67561 and previous config saved to /var/cache/conftool/dbconfig/20240822-070708-ladsgroup.json
07:06 kartik@deploy1003: kartik: Continuing with sync
07:05 kartik@deploy1003: kartik: Backport for Content Translation: Revert MT threshold to default for Portuguese Wikipedia (T356356) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:03 kartik@deploy1003: Started scap sync-world: Backport for Content Translation: Revert MT threshold to default for Portuguese Wikipedia (T356356)
06:57 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4637
06:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 4637
06:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P67560 and previous config saved to /var/cache/conftool/dbconfig/20240822-065201-ladsgroup.json
06:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 40317
06:41 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 40317
06:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P67559 and previous config saved to /var/cache/conftool/dbconfig/20240822-063653-ladsgroup.json
06:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T371742)', diff saved to https://phabricator.wikimedia.org/P67558 and previous config saved to /var/cache/conftool/dbconfig/20240822-062146-ladsgroup.json
06:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T371742)', diff saved to https://phabricator.wikimedia.org/P67557 and previous config saved to /var/cache/conftool/dbconfig/20240822-061202-ladsgroup.json
06:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance
06:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance
06:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T371742)', diff saved to https://phabricator.wikimedia.org/P67556 and previous config saved to /var/cache/conftool/dbconfig/20240822-061140-ladsgroup.json
05:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P67555 and previous config saved to /var/cache/conftool/dbconfig/20240822-055633-ladsgroup.json
05:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P67554 and previous config saved to /var/cache/conftool/dbconfig/20240822-054125-ladsgroup.json
05:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T371742)', diff saved to https://phabricator.wikimedia.org/P67553 and previous config saved to /var/cache/conftool/dbconfig/20240822-052618-ladsgroup.json
05:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T371742)', diff saved to https://phabricator.wikimedia.org/P67552 and previous config saved to /var/cache/conftool/dbconfig/20240822-051547-ladsgroup.json
05:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
05:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
05:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T371742)', diff saved to https://phabricator.wikimedia.org/P67551 and previous config saved to /var/cache/conftool/dbconfig/20240822-051536-ladsgroup.json
05:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P67550 and previous config saved to /var/cache/conftool/dbconfig/20240822-050027-ladsgroup.json
04:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P67549 and previous config saved to /var/cache/conftool/dbconfig/20240822-044520-ladsgroup.json
04:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T371742)', diff saved to https://phabricator.wikimedia.org/P67548 and previous config saved to /var/cache/conftool/dbconfig/20240822-043013-ladsgroup.json
04:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T371742)', diff saved to https://phabricator.wikimedia.org/P67547 and previous config saved to /var/cache/conftool/dbconfig/20240822-040551-ladsgroup.json
04:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
04:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
04:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T371742)', diff saved to https://phabricator.wikimedia.org/P67546 and previous config saved to /var/cache/conftool/dbconfig/20240822-040529-ladsgroup.json
03:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P67545 and previous config saved to /var/cache/conftool/dbconfig/20240822-035022-ladsgroup.json
03:48 eileen: config revision changed from b1b3a1e6 to 69a40997
03:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P67544 and previous config saved to /var/cache/conftool/dbconfig/20240822-033514-ladsgroup.json
03:21 eileen: civicrm upgraded from b27307a9 to 7dc4401a
03:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T371742)', diff saved to https://phabricator.wikimedia.org/P67543 and previous config saved to /var/cache/conftool/dbconfig/20240822-032007-ladsgroup.json
03:10 eileen: civicrm upgraded from 3183c865 to b27307a9
02:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T371742)', diff saved to https://phabricator.wikimedia.org/P67542 and previous config saved to /var/cache/conftool/dbconfig/20240822-025529-ladsgroup.json
02:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
02:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
02:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
02:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
02:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T371742)', diff saved to https://phabricator.wikimedia.org/P67541 and previous config saved to /var/cache/conftool/dbconfig/20240822-025451-ladsgroup.json
02:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P67540 and previous config saved to /var/cache/conftool/dbconfig/20240822-023944-ladsgroup.json
02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P67539 and previous config saved to /var/cache/conftool/dbconfig/20240822-022437-ladsgroup.json
02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T371742)', diff saved to https://phabricator.wikimedia.org/P67538 and previous config saved to /var/cache/conftool/dbconfig/20240822-020930-ladsgroup.json
01:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T371742)', diff saved to https://phabricator.wikimedia.org/P67537 and previous config saved to /var/cache/conftool/dbconfig/20240822-014441-ladsgroup.json
01:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
01:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
01:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T371742)', diff saved to https://phabricator.wikimedia.org/P67536 and previous config saved to /var/cache/conftool/dbconfig/20240822-014419-ladsgroup.json
01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2043.codfw.wmnet with OS bookworm
01:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P67535 and previous config saved to /var/cache/conftool/dbconfig/20240822-012912-ladsgroup.json
01:26 eileen: civicrm upgraded from ed72cf6c to 3183c865
01:26 eileen: config revision changed from b1b3a1e6 to 69a40997
01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2044.codfw.wmnet with OS bookworm
01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:21 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2042.codfw.wmnet with OS bookworm
01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P67534 and previous config saved to /var/cache/conftool/dbconfig/20240822-011405-ladsgroup.json
01:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2041.codfw.wmnet with OS bookworm
01:10 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2040.codfw.wmnet with OS bookworm
01:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2042.codfw.wmnet with reason: host reimage
01:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2044.codfw.wmnet with reason: host reimage
01:00 eileen: civicrm upgraded from 3b22c823 to ed72cf6c
00:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T371742)', diff saved to https://phabricator.wikimedia.org/P67533 and previous config saved to /var/cache/conftool/dbconfig/20240822-005857-ladsgroup.json
00:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2043.codfw.wmnet with reason: host reimage
00:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2041.codfw.wmnet with reason: host reimage
00:54 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2044.codfw.wmnet with reason: host reimage
00:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2043.codfw.wmnet with reason: host reimage
00:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2042.codfw.wmnet with reason: host reimage
00:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2040.codfw.wmnet with reason: host reimage
00:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2041.codfw.wmnet with reason: host reimage
00:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2040.codfw.wmnet with reason: host reimage
00:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2044.codfw.wmnet with OS bookworm
00:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2043.codfw.wmnet with OS bookworm
00:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2042.codfw.wmnet with OS bookworm
00:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2041.codfw.wmnet with OS bookworm
00:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2040.codfw.wmnet with OS bookworm
00:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2129 (T371742)', diff saved to https://phabricator.wikimedia.org/P67532 and previous config saved to /var/cache/conftool/dbconfig/20240822-003352-ladsgroup.json
00:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
00:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
00:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T371742)', diff saved to https://phabricator.wikimedia.org/P67531 and previous config saved to /var/cache/conftool/dbconfig/20240822-003330-ladsgroup.json
00:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2040.codfw.wmnet with OS bookworm
00:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P67530 and previous config saved to /var/cache/conftool/dbconfig/20240822-001823-ladsgroup.json
00:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2040.codfw.wmnet with OS bookworm
00:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2040.codfw.wmnet with OS bookworm
00:13 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2040.codfw.wmnet with OS bookworm
00:12 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1298.eqiad.wmnet with OS bullseye
00:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1298.eqiad.wmnet with OS bullseye
00:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2044.codfw.wmnet with OS bookworm
00:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2043.codfw.wmnet with OS bookworm
00:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2044.codfw.wmnet with OS bookworm
00:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2042.codfw.wmnet with OS bookworm
00:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2041.codfw.wmnet with OS bookworm
00:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2040.codfw.wmnet with OS bookworm
00:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2043.codfw.wmnet with OS bookworm
00:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2042.codfw.wmnet with OS bookworm
00:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2041.codfw.wmnet with OS bookworm
00:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2040.codfw.wmnet with OS bookworm
00:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1298.eqiad.wmnet with OS bullseye
00:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1298.eqiad.wmnet with OS bullseye
00:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2038.codfw.wmnet with OS bookworm
00:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
00:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P67529 and previous config saved to /var/cache/conftool/dbconfig/20240822-000315-ladsgroup.json
00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
00:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1298.eqiad.wmnet with OS bullseye
00:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2039.codfw.wmnet with OS bookworm
00:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"

2024-08-21

23:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2036.codfw.wmnet with OS bookworm
23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1298.eqiad.wmnet with OS bullseye
23:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T370903)', diff saved to https://phabricator.wikimedia.org/P67528 and previous config saved to /var/cache/conftool/dbconfig/20240821-235559-ladsgroup.json
23:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
23:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2037.codfw.wmnet with OS bookworm
23:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T371742)', diff saved to https://phabricator.wikimedia.org/P67527 and previous config saved to /var/cache/conftool/dbconfig/20240821-234808-ladsgroup.json
23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2038.codfw.wmnet with reason: host reimage
23:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2039.codfw.wmnet with reason: host reimage
23:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2036.codfw.wmnet with reason: host reimage
23:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P67526 and previous config saved to /var/cache/conftool/dbconfig/20240821-234051-ladsgroup.json
23:40 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2039.codfw.wmnet with reason: host reimage
23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2037.codfw.wmnet with reason: host reimage
23:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2038.codfw.wmnet with reason: host reimage
23:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2036.codfw.wmnet with reason: host reimage
23:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2035.codfw.wmnet with OS bookworm
23:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2037.codfw.wmnet with reason: host reimage
23:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
23:27 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
23:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
23:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P67525 and previous config saved to /var/cache/conftool/dbconfig/20240821-232544-ladsgroup.json
23:24 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
23:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2039.codfw.wmnet with OS bookworm
23:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T371742)', diff saved to https://phabricator.wikimedia.org/P67524 and previous config saved to /var/cache/conftool/dbconfig/20240821-232341-ladsgroup.json
23:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
23:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2035.codfw.wmnet with reason: host reimage
23:22 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
23:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2038.codfw.wmnet with OS bookworm
23:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2037.codfw.wmnet with OS bookworm
23:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2036.codfw.wmnet with OS bookworm
23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2035.codfw.wmnet with reason: host reimage
23:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
23:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
23:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T370903)', diff saved to https://phabricator.wikimedia.org/P67523 and previous config saved to /var/cache/conftool/dbconfig/20240821-231037-ladsgroup.json
23:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2213 (T370903)', diff saved to https://phabricator.wikimedia.org/P67522 and previous config saved to /var/cache/conftool/dbconfig/20240821-230600-ladsgroup.json
23:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2213.codfw.wmnet with reason: Maintenance
23:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2213.codfw.wmnet with reason: Maintenance
23:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T370903)', diff saved to https://phabricator.wikimedia.org/P67521 and previous config saved to /var/cache/conftool/dbconfig/20240821-230549-ladsgroup.json
22:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
22:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
22:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T371742)', diff saved to https://phabricator.wikimedia.org/P67520 and previous config saved to /var/cache/conftool/dbconfig/20240821-225436-ladsgroup.json
22:52 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2035.codfw.wmnet with OS bookworm
22:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P67519 and previous config saved to /var/cache/conftool/dbconfig/20240821-225042-ladsgroup.json
22:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P67518 and previous config saved to /var/cache/conftool/dbconfig/20240821-223929-ladsgroup.json
22:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P67517 and previous config saved to /var/cache/conftool/dbconfig/20240821-223535-ladsgroup.json
22:30 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-main journal) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs2022.codfw.wmnet w/ force delete existing files, repooling neither afterwards
22:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P67516 and previous config saved to /var/cache/conftool/dbconfig/20240821-222422-ladsgroup.json
22:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T370903)', diff saved to https://phabricator.wikimedia.org/P67515 and previous config saved to /var/cache/conftool/dbconfig/20240821-222028-ladsgroup.json
22:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T370903)', diff saved to https://phabricator.wikimedia.org/P67514 and previous config saved to /var/cache/conftool/dbconfig/20240821-221450-ladsgroup.json
22:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2211.codfw.wmnet with reason: Maintenance
22:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2211.codfw.wmnet with reason: Maintenance
22:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2201.codfw.wmnet with reason: Maintenance
22:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2201.codfw.wmnet with reason: Maintenance
22:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T370903)', diff saved to https://phabricator.wikimedia.org/P67512 and previous config saved to /var/cache/conftool/dbconfig/20240821-220947-ladsgroup.json
22:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T371742)', diff saved to https://phabricator.wikimedia.org/P67511 and previous config saved to /var/cache/conftool/dbconfig/20240821-220915-ladsgroup.json
22:09 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs2023.codfw.wmnet w/ force delete existing files, repooling neither afterwards
21:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T371742)', diff saved to https://phabricator.wikimedia.org/P67510 and previous config saved to /var/cache/conftool/dbconfig/20240821-215537-ladsgroup.json
21:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1231.eqiad.wmnet with reason: Maintenance
21:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1231.eqiad.wmnet with reason: Maintenance
21:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P67509 and previous config saved to /var/cache/conftool/dbconfig/20240821-215440-ladsgroup.json
21:42 amastilovic@deploy1003: Finished deploy [airflow-dags/wmde@109c99e]: (no justification provided) (duration: 00m 03s)
21:42 amastilovic@deploy1003: Started deploy [airflow-dags/wmde@109c99e]: (no justification provided)
21:42 amastilovic@deploy1003: Finished deploy [airflow-dags/search@109c99e]: (no justification provided) (duration: 00m 03s)
21:42 amastilovic@deploy1003: Started deploy [airflow-dags/search@109c99e]: (no justification provided)
21:42 amastilovic@deploy1003: Finished deploy [airflow-dags/analytics_product@1856d12]: (no justification provided) (duration: 00m 03s)
21:41 amastilovic@deploy1003: Started deploy [airflow-dags/analytics_product@1856d12]: (no justification provided)
21:41 amastilovic@deploy1003: Finished deploy [airflow-dags/research@109c99e]: (no justification provided) (duration: 00m 03s)
21:41 amastilovic@deploy1003: Started deploy [airflow-dags/research@109c99e]: (no justification provided)
21:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P67508 and previous config saved to /var/cache/conftool/dbconfig/20240821-213932-ladsgroup.json
21:39 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs2022.codfw.wmnet w/ force delete existing files, repooling neither afterwards
21:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
21:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
21:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T371742)', diff saved to https://phabricator.wikimedia.org/P67507 and previous config saved to /var/cache/conftool/dbconfig/20240821-213323-ladsgroup.json
21:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs2023.codfw.wmnet w/ force delete existing files, repooling neither afterwards
21:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T370903)', diff saved to https://phabricator.wikimedia.org/P67506 and previous config saved to /var/cache/conftool/dbconfig/20240821-212425-ladsgroup.json
21:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T370903)', diff saved to https://phabricator.wikimedia.org/P67505 and previous config saved to /var/cache/conftool/dbconfig/20240821-212024-ladsgroup.json
21:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2192.codfw.wmnet with reason: Maintenance
21:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2192.codfw.wmnet with reason: Maintenance
21:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T370903)', diff saved to https://phabricator.wikimedia.org/P67504 and previous config saved to /var/cache/conftool/dbconfig/20240821-212002-ladsgroup.json
21:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P67503 and previous config saved to /var/cache/conftool/dbconfig/20240821-211816-ladsgroup.json
21:11 amastilovic@deploy1003: Finished deploy [airflow-dags/analytics@1856d12]: (no justification provided) (duration: 01m 35s)
21:09 amastilovic@deploy1003: Started deploy [airflow-dags/analytics@1856d12]: (no justification provided)
21:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P67502 and previous config saved to /var/cache/conftool/dbconfig/20240821-210455-ladsgroup.json
21:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P67501 and previous config saved to /var/cache/conftool/dbconfig/20240821-210309-ladsgroup.json
21:00 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
20:58 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
20:58 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
20:57 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
20:57 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
20:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
20:54 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
20:53 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
20:53 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
20:52 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
20:52 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
20:51 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
20:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P67500 and previous config saved to /var/cache/conftool/dbconfig/20240821-204948-ladsgroup.json
20:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T371742)', diff saved to https://phabricator.wikimedia.org/P67499 and previous config saved to /var/cache/conftool/dbconfig/20240821-204802-ladsgroup.json
20:41 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
20:40 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
20:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T371742)', diff saved to https://phabricator.wikimedia.org/P67498 and previous config saved to /var/cache/conftool/dbconfig/20240821-203753-ladsgroup.json
20:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
20:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
20:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T371742)', diff saved to https://phabricator.wikimedia.org/P67497 and previous config saved to /var/cache/conftool/dbconfig/20240821-203731-ladsgroup.json
20:35 cjming: end of UTC late backport window
20:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T370903)', diff saved to https://phabricator.wikimedia.org/P67496 and previous config saved to /var/cache/conftool/dbconfig/20240821-203442-ladsgroup.json
20:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T370903)', diff saved to https://phabricator.wikimedia.org/P67495 and previous config saved to /var/cache/conftool/dbconfig/20240821-203029-ladsgroup.json
20:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2178.codfw.wmnet with reason: Maintenance
20:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2178.codfw.wmnet with reason: Maintenance
20:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T370903)', diff saved to https://phabricator.wikimedia.org/P67494 and previous config saved to /var/cache/conftool/dbconfig/20240821-203007-ladsgroup.json
20:29 cjming@deploy1003: Finished scap sync-world: Backport for ve.ui.CodeMirrorAction.v6: use infinity viewport to avoid misalignment (T357482) (duration: 13m 14s)
20:25 cjming@deploy1003: musikanimal, cjming: Continuing with sync
20:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P67492 and previous config saved to /var/cache/conftool/dbconfig/20240821-202224-ladsgroup.json
20:21 cjming@deploy1003: musikanimal, cjming: Backport for ve.ui.CodeMirrorAction.v6: use infinity viewport to avoid misalignment (T357482) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:16 cjming@deploy1003: Started scap sync-world: Backport for ve.ui.CodeMirrorAction.v6: use infinity viewport to avoid misalignment (T357482)
20:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P67489 and previous config saved to /var/cache/conftool/dbconfig/20240821-201500-ladsgroup.json
20:14 bearloga@deploy1003: Finished deploy [airflow-dags/analytics_product@1856d12]: (no justification provided) (duration: 00m 03s)
20:14 bearloga@deploy1003: Started deploy [airflow-dags/analytics_product@1856d12]: (no justification provided)
20:12 bearloga@deploy1003: Finished deploy [airflow-dags/analytics_product@1856d12]: (no justification provided) (duration: 00m 17s)
20:11 bearloga@deploy1003: Started deploy [airflow-dags/analytics_product@1856d12]: (no justification provided)
20:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P67488 and previous config saved to /var/cache/conftool/dbconfig/20240821-200716-ladsgroup.json
19:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P67487 and previous config saved to /var/cache/conftool/dbconfig/20240821-195952-ladsgroup.json
19:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
19:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T371742)', diff saved to https://phabricator.wikimedia.org/P67486 and previous config saved to /var/cache/conftool/dbconfig/20240821-195209-ladsgroup.json
19:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T370903)', diff saved to https://phabricator.wikimedia.org/P67485 and previous config saved to /var/cache/conftool/dbconfig/20240821-194445-ladsgroup.json
19:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T371742)', diff saved to https://phabricator.wikimedia.org/P67484 and previous config saved to /var/cache/conftool/dbconfig/20240821-194036-ladsgroup.json
19:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
19:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
19:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T371742)', diff saved to https://phabricator.wikimedia.org/P67483 and previous config saved to /var/cache/conftool/dbconfig/20240821-194014-ladsgroup.json
19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T370903)', diff saved to https://phabricator.wikimedia.org/P67482 and previous config saved to /var/cache/conftool/dbconfig/20240821-193843-ladsgroup.json
19:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
19:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T370903)', diff saved to https://phabricator.wikimedia.org/P67481 and previous config saved to /var/cache/conftool/dbconfig/20240821-193821-ladsgroup.json
19:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
19:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1017.eqiad.wmnet with OS bookworm
19:30 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
19:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P67480 and previous config saved to /var/cache/conftool/dbconfig/20240821-192507-ladsgroup.json
19:24 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P67479 and previous config saved to /var/cache/conftool/dbconfig/20240821-192314-ladsgroup.json
19:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P67478 and previous config saved to /var/cache/conftool/dbconfig/20240821-190959-ladsgroup.json
19:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1017.eqiad.wmnet with reason: host reimage
19:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P67477 and previous config saved to /var/cache/conftool/dbconfig/20240821-190807-ladsgroup.json
19:06 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1017.eqiad.wmnet with reason: host reimage
18:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T371742)', diff saved to https://phabricator.wikimedia.org/P67476 and previous config saved to /var/cache/conftool/dbconfig/20240821-185452-ladsgroup.json
18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T370903)', diff saved to https://phabricator.wikimedia.org/P67475 and previous config saved to /var/cache/conftool/dbconfig/20240821-185300-ladsgroup.json
18:51 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1017.eqiad.wmnet with OS bookworm
18:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T370903)', diff saved to https://phabricator.wikimedia.org/P67474 and previous config saved to /var/cache/conftool/dbconfig/20240821-184633-ladsgroup.json
18:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2157.codfw.wmnet with reason: Maintenance
18:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2157.codfw.wmnet with reason: Maintenance
18:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T370903)', diff saved to https://phabricator.wikimedia.org/P67473 and previous config saved to /var/cache/conftool/dbconfig/20240821-184611-ladsgroup.json
18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T371742)', diff saved to https://phabricator.wikimedia.org/P67472 and previous config saved to /var/cache/conftool/dbconfig/20240821-184427-ladsgroup.json
18:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
18:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T371742)', diff saved to https://phabricator.wikimedia.org/P67471 and previous config saved to /var/cache/conftool/dbconfig/20240821-184405-ladsgroup.json
18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bookworm
18:43 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
18:43 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
18:41 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1017.eqiad.wmnet with OS bookworm
18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1018.eqiad.wmnet with OS bookworm
18:40 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
18:39 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
18:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1020.eqiad.wmnet with OS bookworm
18:36 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
18:36 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bookworm
18:34 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
18:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
18:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P67470 and previous config saved to /var/cache/conftool/dbconfig/20240821-183104-ladsgroup.json
18:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P67469 and previous config saved to /var/cache/conftool/dbconfig/20240821-182858-ladsgroup.json
18:21 swfrench-wmf: imported php-memcached_3.2.0++-1+wmf11u1 into component/php81 - T372507
18:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage
18:17 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1018.eqiad.wmnet with reason: host reimage
18:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P67468 and previous config saved to /var/cache/conftool/dbconfig/20240821-181556-ladsgroup.json
18:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage
18:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P67467 and previous config saved to /var/cache/conftool/dbconfig/20240821-181351-ladsgroup.json
18:13 swfrench-wmf: imported php-redis_6.0.2-1+wmf11u1 into component/php81 - T372507
18:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage
18:10 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage
18:09 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage
18:09 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1018.eqiad.wmnet with reason: host reimage
18:08 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage
18:04 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T370903)', diff saved to https://phabricator.wikimedia.org/P67466 and previous config saved to /var/cache/conftool/dbconfig/20240821-180049-ladsgroup.json
17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T371742)', diff saved to https://phabricator.wikimedia.org/P67465 and previous config saved to /var/cache/conftool/dbconfig/20240821-175843-ladsgroup.json
17:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T370903)', diff saved to https://phabricator.wikimedia.org/P67464 and previous config saved to /var/cache/conftool/dbconfig/20240821-175638-ladsgroup.json
17:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
17:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
17:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2128.codfw.wmnet with reason: Maintenance
17:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2128.codfw.wmnet with reason: Maintenance
17:56 swfrench-wmf: imported php-igbinary_3.2.15-1+wmf11u1 into component/php81 - T372507
17:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm
17:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bookworm
17:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1018.eqiad.wmnet with OS bookworm
17:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1017.eqiad.wmnet with OS bookworm
17:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bookworm
17:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
17:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
17:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1245.eqiad.wmnet with reason: Maintenance
17:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1245.eqiad.wmnet with reason: Maintenance
17:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T371742)', diff saved to https://phabricator.wikimedia.org/P67463 and previous config saved to /var/cache/conftool/dbconfig/20240821-174750-ladsgroup.json
17:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
17:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
17:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T371742)', diff saved to https://phabricator.wikimedia.org/P67462 and previous config saved to /var/cache/conftool/dbconfig/20240821-174728-ladsgroup.json
17:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1216.eqiad.wmnet with reason: Maintenance
17:43 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1216.eqiad.wmnet with reason: Maintenance
17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T370903)', diff saved to https://phabricator.wikimedia.org/P67461 and previous config saved to /var/cache/conftool/dbconfig/20240821-174351-ladsgroup.json
17:39 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1005.eqiad.wmnet with OS bookworm
17:37 swfrench-wmf: imported xdebug_3.3.2-1+wmf11u1 into component/php81 - T372507
17:36 swfrench-wmf: imported wikidiff2_1.14.1-2+wmf11u1 into component/php81 - T372507
17:35 swfrench-wmf: imported tideways_5.0.4-16+wmf11u1 into component/php81 - T372507
17:35 ladsgroup@deploy1003: Finished scap sync-world: Backport for Change the disabled query page for commons (T369024) (duration: 07m 36s)
17:34 swfrench-wmf: imported php-yaml_2.2.3-2+wmf11u1 into component/php81 - T372507
17:34 swfrench-wmf: imported php-wmerrors_2.0.0-1+wmf11u1 into component/php81 - T372507
17:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P67460 and previous config saved to /var/cache/conftool/dbconfig/20240821-173221-ladsgroup.json
17:31 ladsgroup@deploy1003: ladsgroup: Continuing with sync
17:30 ladsgroup@deploy1003: ladsgroup: Backport for Change the disabled query page for commons (T369024) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P67459 and previous config saved to /var/cache/conftool/dbconfig/20240821-172844-ladsgroup.json
17:28 ladsgroup@deploy1003: Started scap sync-world: Backport for Change the disabled query page for commons (T369024)
17:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host ganeti2036.codfw.wmnet
17:19 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1017.eqiad.wmnet with OS bookworm
17:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P67458 and previous config saved to /var/cache/conftool/dbconfig/20240821-171714-ladsgroup.json
17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P67457 and previous config saved to /var/cache/conftool/dbconfig/20240821-171337-ladsgroup.json
17:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T371742)', diff saved to https://phabricator.wikimedia.org/P67455 and previous config saved to /var/cache/conftool/dbconfig/20240821-170206-ladsgroup.json
16:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T370903)', diff saved to https://phabricator.wikimedia.org/P67454 and previous config saved to /var/cache/conftool/dbconfig/20240821-165829-ladsgroup.json
16:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T370903)', diff saved to https://phabricator.wikimedia.org/P67453 and previous config saved to /var/cache/conftool/dbconfig/20240821-165415-ladsgroup.json
16:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1213.eqiad.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1213.eqiad.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T370903)', diff saved to https://phabricator.wikimedia.org/P67452 and previous config saved to /var/cache/conftool/dbconfig/20240821-165353-ladsgroup.json
16:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T371742)', diff saved to https://phabricator.wikimedia.org/P67451 and previous config saved to /var/cache/conftool/dbconfig/20240821-165027-ladsgroup.json
16:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
16:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
16:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
16:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
16:46 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host ganeti2036.codfw.wmnet
16:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host ganeti2036.codfw.wmnet
16:45 swfrench-wmf: imported php-pcov_1.0.11-5+wmf11u1 into component/php81 - T372507
16:45 swfrench-wmf: imported php-msgpack_2.2.0-4+wmf11u1 into component/php81 - T372507
16:44 swfrench-wmf: imported php-luasandbox_4.1.2-1+wmf11u1 into component/php81 - T372507
16:42 swfrench-wmf: imported php-imagick_3.7.0-6+wmf11u1 into component/php81 - T372507
16:41 swfrench-wmf: imported php-excimer_1.2.2-1+wmf11u1 into component/php81 - T372507
16:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2035.codfw.wmnet with OS bookworm
16:40 swfrench-wmf: imported php-apcu_5.1.23-1+wmf11u1 into component/php81 - T372507
16:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P67450 and previous config saved to /var/cache/conftool/dbconfig/20240821-163846-ladsgroup.json
16:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
16:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P67449 and previous config saved to /var/cache/conftool/dbconfig/20240821-162339-ladsgroup.json
16:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T370903)', diff saved to https://phabricator.wikimedia.org/P67448 and previous config saved to /var/cache/conftool/dbconfig/20240821-160831-ladsgroup.json
16:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T370903)', diff saved to https://phabricator.wikimedia.org/P67447 and previous config saved to /var/cache/conftool/dbconfig/20240821-160345-ladsgroup.json
16:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1210.eqiad.wmnet with reason: Maintenance
16:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1210.eqiad.wmnet with reason: Maintenance
16:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T370903)', diff saved to https://phabricator.wikimedia.org/P67446 and previous config saved to /var/cache/conftool/dbconfig/20240821-160323-ladsgroup.json
15:57 MichaelG_WMF: T372333, with I431d2a checked out, running mwscript /home/migr/GrowthExperiments/maintenance/fixLinkRecommendationData.php --dry-run --wiki=testwiki --search-index --db-table
15:56 ejegg: fundraising python tools upgraded from 490a7b3f to 3f7b238d
15:48 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2035.codfw.wmnet with OS bookworm
15:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P67445 and previous config saved to /var/cache/conftool/dbconfig/20240821-154815-ladsgroup.json
15:36 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
15:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P67444 and previous config saved to /var/cache/conftool/dbconfig/20240821-153306-ladsgroup.json
15:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T370903)', diff saved to https://phabricator.wikimedia.org/P67443 and previous config saved to /var/cache/conftool/dbconfig/20240821-151759-ladsgroup.json
15:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T370903)', diff saved to https://phabricator.wikimedia.org/P67442 and previous config saved to /var/cache/conftool/dbconfig/20240821-151441-ladsgroup.json
15:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1200.eqiad.wmnet with reason: Maintenance
15:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1200.eqiad.wmnet with reason: Maintenance
15:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T370903)', diff saved to https://phabricator.wikimedia.org/P67441 and previous config saved to /var/cache/conftool/dbconfig/20240821-151419-ladsgroup.json
14:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P67440 and previous config saved to /var/cache/conftool/dbconfig/20240821-145912-ladsgroup.json
14:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2024.mgmt.codfw.wmnet with reboot policy FORCED
14:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T367856)', diff saved to https://phabricator.wikimedia.org/P67439 and previous config saved to /var/cache/conftool/dbconfig/20240821-144648-marostegui.json
14:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 7:00:00 on db2163.codfw.wmnet with reason: Maintenance
14:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 7:00:00 on db2163.codfw.wmnet with reason: Maintenance
14:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T367856)', diff saved to https://phabricator.wikimedia.org/P67438 and previous config saved to /var/cache/conftool/dbconfig/20240821-144625-marostegui.json
14:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P67437 and previous config saved to /var/cache/conftool/dbconfig/20240821-144405-ladsgroup.json
14:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2024.mgmt.codfw.wmnet with reboot policy FORCED
14:41 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host ganeti2036.codfw.wmnet
14:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2024']
14:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2024']
14:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P67434 and previous config saved to /var/cache/conftool/dbconfig/20240821-143118-marostegui.json
14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T370903)', diff saved to https://phabricator.wikimedia.org/P67433 and previous config saved to /var/cache/conftool/dbconfig/20240821-142858-ladsgroup.json
14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T370903)', diff saved to https://phabricator.wikimedia.org/P67432 and previous config saved to /var/cache/conftool/dbconfig/20240821-142536-ladsgroup.json
14:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1185.eqiad.wmnet with reason: Maintenance
14:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1185.eqiad.wmnet with reason: Maintenance
14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T370903)', diff saved to https://phabricator.wikimedia.org/P67431 and previous config saved to /var/cache/conftool/dbconfig/20240821-142514-ladsgroup.json
14:22 topranks: enable PyBal on lvs2013 to swing traffic back from lvs2014
14:22 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2035.codfw.wmnet with OS bookworm
14:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P67430 and previous config saved to /var/cache/conftool/dbconfig/20240821-141611-marostegui.json
14:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T367856)', diff saved to https://phabricator.wikimedia.org/P67428 and previous config saved to /var/cache/conftool/dbconfig/20240821-140104-marostegui.json
13:58 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
13:58 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
13:55 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P67427 and previous config saved to /var/cache/conftool/dbconfig/20240821-135458-ladsgroup.json
13:54 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
13:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:52 cdanis@deploy1003: Finished scap sync-world: Backport for [arwikinews]: Upgrade license to CC BY-SA 4.0 (T372730) (duration: 10m 05s)
13:50 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:47 cdanis@deploy1003: anwon, cdanis: Continuing with sync
13:44 cdanis@deploy1003: anwon, cdanis: Backport for [arwikinews]: Upgrade license to CC BY-SA 4.0 (T372730) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:42 cdanis@deploy1003: Started scap sync-world: Backport for [arwikinews]: Upgrade license to CC BY-SA 4.0 (T372730)
13:40 cdanis@deploy1003: Finished scap sync-world: Backport for Enable shellbox-video for enwiki (T356241) (duration: 07m 18s)
13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T370903)', diff saved to https://phabricator.wikimedia.org/P67426 and previous config saved to /var/cache/conftool/dbconfig/20240821-133950-ladsgroup.json
13:39 topranks: disable PyBal on lvs2013 to switch traffic to lvs2014
13:35 cdanis@deploy1003: hnowlan, cdanis: Continuing with sync
13:35 cdanis@deploy1003: hnowlan, cdanis: Backport for Enable shellbox-video for enwiki (T356241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1183 (T370903)', diff saved to https://phabricator.wikimedia.org/P67425 and previous config saved to /var/cache/conftool/dbconfig/20240821-133411-ladsgroup.json
13:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1183.eqiad.wmnet with reason: Maintenance
13:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1183.eqiad.wmnet with reason: Maintenance
13:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T370903)', diff saved to https://phabricator.wikimedia.org/P67424 and previous config saved to /var/cache/conftool/dbconfig/20240821-133349-ladsgroup.json
13:33 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: test failover lvs2013 to ls2014
13:33 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: test failover lvs2013 to ls2014
13:32 cdanis@deploy1003: Started scap sync-world: Backport for Enable shellbox-video for enwiki (T356241)
13:31 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr[1-2]-codfw with reason: test failover lvs2013 to ls2014
13:31 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr[1-2]-codfw with reason: test failover lvs2013 to ls2014
13:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2035.codfw.wmnet with OS bookworm
13:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P67423 and previous config saved to /var/cache/conftool/dbconfig/20240821-131842-ladsgroup.json
13:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "mgmt: add role - ayounsi@cumin1002"
13:16 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "mgmt: add role - ayounsi@cumin1002"
13:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P67422 and previous config saved to /var/cache/conftool/dbconfig/20240821-130335-ladsgroup.json
12:59 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1005.eqiad.wmnet with OS bookworm
12:53 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
12:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T370903)', diff saved to https://phabricator.wikimedia.org/P67421 and previous config saved to /var/cache/conftool/dbconfig/20240821-124828-ladsgroup.json
12:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T370903)', diff saved to https://phabricator.wikimedia.org/P67420 and previous config saved to /var/cache/conftool/dbconfig/20240821-124252-ladsgroup.json
12:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
12:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
12:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
12:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
12:34 XioNoX: add python3-pynetbox_7.4.0_all.deb to reprepro - T371890
12:23 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
12:22 XioNoX: install python3-pynetbox_7.4.0 manually on cumin2002
12:22 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1005.eqiad.wmnet with OS bookworm
12:14 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bookworm
12:07 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
11:53 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
11:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
11:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
11:01 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
10:55 stran@deploy1003: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
10:54 stran@deploy1003: helmfile [codfw] START helmfile.d/services/ipoid: apply
10:53 stran@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
10:52 stran@deploy1003: helmfile [eqiad] START helmfile.d/services/ipoid: apply
10:50 stran@deploy1003: helmfile [staging] DONE helmfile.d/services/ipoid: apply
10:48 stran@deploy1003: helmfile [staging] START helmfile.d/services/ipoid: apply
10:48 stran@deploy1003: helmfile [staging] DONE helmfile.d/services/ipoid: apply
10:47 stran@deploy1003: helmfile [staging] START helmfile.d/services/ipoid: apply
10:11 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
10:04 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
09:51 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
09:44 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
09:41 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
09:34 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
09:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 262725
09:14 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 262725
09:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28173
09:13 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 28173
09:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67418 and previous config saved to /var/cache/conftool/dbconfig/20240821-090421-root.json
08:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67417 and previous config saved to /var/cache/conftool/dbconfig/20240821-084915-root.json
08:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67416 and previous config saved to /var/cache/conftool/dbconfig/20240821-083410-root.json
08:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67415 and previous config saved to /var/cache/conftool/dbconfig/20240821-081904-root.json
08:15 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.19 refs T366964
08:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67414 and previous config saved to /var/cache/conftool/dbconfig/20240821-080359-root.json
07:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67413 and previous config saved to /var/cache/conftool/dbconfig/20240821-074854-root.json
07:42 XioNoX: rollback JIO_DIRECT from cr2-eqsin AVOID-PATHS
07:39 XioNoX: enable cloudsw1-d5-eqiad:xe-0/0/21 (SFP now inserted)
07:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67412 and previous config saved to /var/cache/conftool/dbconfig/20240821-073348-root.json
07:27 brouberol@cumin1002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for wdqs1024.eqiad.wmnet: Renew puppet certificate - brouberol@cumin1002
07:27 brouberol@cumin1002: START - Cookbook sre.puppet.renew-cert for wdqs1024.eqiad.wmnet: Renew puppet certificate - brouberol@cumin1002
07:02 XioNoX: remove bgp session to mw2291 on codfw routers (host renumbered)
06:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: post backup w/o prefetch repooling', diff saved to https://phabricator.wikimedia.org/P67411 and previous config saved to /var/cache/conftool/dbconfig/20240821-065624-arnaudb.json
06:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: post backup w/o prefetch repooling', diff saved to https://phabricator.wikimedia.org/P67410 and previous config saved to /var/cache/conftool/dbconfig/20240821-064119-arnaudb.json
06:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: post backup w/o prefetch repooling', diff saved to https://phabricator.wikimedia.org/P67409 and previous config saved to /var/cache/conftool/dbconfig/20240821-062613-arnaudb.json
06:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: post backup w/o prefetch repooling', diff saved to https://phabricator.wikimedia.org/P67408 and previous config saved to /var/cache/conftool/dbconfig/20240821-061108-arnaudb.json
05:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 15%: post backup w/o prefetch repooling', diff saved to https://phabricator.wikimedia.org/P67407 and previous config saved to /var/cache/conftool/dbconfig/20240821-055602-arnaudb.json
05:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: post backup w/o prefetch repooling', diff saved to https://phabricator.wikimedia.org/P67406 and previous config saved to /var/cache/conftool/dbconfig/20240821-054057-arnaudb.json
01:56 eileen: config revision changed from 3ef2ec32 to b1b3a1e6
01:27 eileen: config revision changed from f569b590 to 3ef2ec32 disable jobs to run index-add
00:51 eileen: civicrm upgraded from 1022abf1 to 3b22c823

2024-08-20

22:44 rzl@cumin1002: dbctl commit (dc=all): 'db1206 depooled', diff saved to https://phabricator.wikimedia.org/P67402 and previous config saved to /var/cache/conftool/dbconfig/20240820-224431-rzl.json
22:02 dancy@deploy1003: Installing scap version "4.99.0" for 210 hosts
21:45 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2024.codfw.wmnet with OS bullseye
21:13 cjming: end of UTC late backport window
21:12 cjming@deploy1003: Finished scap sync-world: Backport for Revert "kaawiktionary: add custom logos" (duration: 08m 18s)
21:07 cjming@deploy1003: trainbranchbot, cjming: Continuing with sync
21:07 cjming@deploy1003: trainbranchbot, cjming: Backport for Revert "kaawiktionary: add custom logos" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:04 swfrench-wmf: imported dh-php_5.4+wmf11u1 into component/php81 - T372507
21:03 cjming@deploy1003: Started scap sync-world: Backport for Revert "kaawiktionary: add custom logos"
21:01 cjming@deploy1003: Sync cancelled.
20:58 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2024.codfw.wmnet with OS bullseye
20:57 cjming@deploy1003: cjming, chlod: Backport for kaawiktionary: add custom logos (T368868) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:55 cjming@deploy1003: Started scap sync-world: Backport for kaawiktionary: add custom logos (T368868)
20:54 cjming@deploy1003: Finished scap sync-world: Backport for Revert "kawikisource: add custom logos" (duration: 08m 53s)
20:53 swfrench-wmf: imported php-defaults_92+wmf11u1 into component/php81 - T372507
20:52 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2024.codfw.wmnet with OS bullseye
20:49 cjming@deploy1003: cjming, trainbranchbot: Continuing with sync
20:49 cjming@deploy1003: cjming, trainbranchbot: Backport for Revert "kawikisource: add custom logos" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:45 cjming@deploy1003: Started scap sync-world: Backport for Revert "kawikisource: add custom logos"
20:41 cjming@deploy1003: Sync cancelled.
20:39 cjming@deploy1003: cjming, chlod: Backport for kawikisource: add custom logos (T368868) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:36 cjming@deploy1003: Started scap sync-world: Backport for kawikisource: add custom logos (T368868)
20:35 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox
20:32 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
20:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-sd2004.codfw.wmnet with OS bookworm
20:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:31 swfrench-wmf: imported php8.1_8.1.29-1+wmf11u1 into component/php81 - T372507
20:26 cmooney@cumin1002: START - Cookbook sre.dns.netbox
20:05 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2024.codfw.wmnet with OS bullseye
20:05 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2024.codfw.wmnet']
20:04 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2024.codfw.wmnet']
20:02 Emperor: depool/restart/repool ms-fe2014 T360913
20:02 Emperor: depool/restart/repool ms-fe2012 T360913
20:01 Emperor: depool/restart/repool ms-fe2011 T360913
20:00 Emperor: depool/restart/repool ms-fe2009 T360913
19:58 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2024.codfw.wmnet']
19:35 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2024.codfw.wmnet']
19:16 topranks: restarting netbox service on netbox1003 to update script
18:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1017.eqiad.wmnet with OS bookworm
18:24 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2024.codfw.wmnet with OS bullseye
18:03 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2014.codfw.wmnet with OS bullseye
17:51 swfrench-wmf: mediawiki statsd exporter deployments upgraded to bookworm-based image - T368366
17:46 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2014.codfw.wmnet with reason: host reimage
17:45 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1017.eqiad.wmnet with OS bookworm
17:44 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
17:44 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
17:44 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
17:44 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-web: apply
17:44 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
17:44 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
17:44 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
17:43 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
17:43 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
17:43 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2014.codfw.wmnet with reason: host reimage
17:43 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
17:43 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
17:43 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
17:37 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2024.codfw.wmnet with OS bullseye
17:32 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "recover wdqs2024 from failed status T372919 - bking@cumin2002"
17:32 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
17:31 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
17:31 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
17:31 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-web: apply
17:31 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
17:31 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
17:31 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
17:30 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-misc: apply
17:30 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
17:30 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
17:30 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
17:30 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
17:20 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "recover wdqs2024 from failed status T372919 - bking@cumin2002"
17:16 topranks: removing config for ssw1-a8-codfw link to lvs2014 T370897
17:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:06 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
17:06 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
17:02 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
17:02 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-sd2004.codfw.wmnet with reason: host reimage
16:56 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host lvs2014.codfw.wmnet with OS bullseye
16:55 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-sd2004.codfw.wmnet with reason: host reimage
16:51 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lvs2013.codfw.wmnet on all recursors
16:51 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache lvs2013.codfw.wmnet on all recursors
16:50 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:50 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for lvs2014 - cmooney@cumin1002"
16:49 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for lvs2014 - cmooney@cumin1002"
16:43 cmooney@cumin1002: START - Cookbook sre.dns.netbox
16:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2040.codfw.wmnet
16:41 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2040.codfw.wmnet
16:41 claime: Pooling wikikube-worker2040.codfw.wmnet - T351074
16:40 topranks: adding vlans to lsw1-d2-codfw for lvs2014 T370897
16:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host logging-sd2004.codfw.wmnet with OS bookworm
16:38 claime: Running homer 'lsw1-a3-codfw*' commit 'T351074'
16:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logging-sd2004']
16:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-sd2004']
16:28 mutante: LDAP - removed htriedman from wmf group, added htriedman to nda group (T371644)
16:26 topranks: disabling BGP to PyBal on lvs2014 in preparation for move to new switch T370897
16:24 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on lvs2014.codfw.wmnet with reason: move lvs2014 from asw to lsw
16:24 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on lvs2014.codfw.wmnet with reason: move lvs2014 from asw to lsw
16:23 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on lsw1-d2-codfw.mgmt with reason: move lvs2014 from asw to lsw
16:23 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on lsw1-d2-codfw.mgmt with reason: move lvs2014 from asw to lsw
16:22 topranks: begginng work to reimage lvs2014 onto per-rack vlan in codfw rack D2 and move to new switch T370897
16:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-sd2004.mgmt.codfw.wmnet with reboot policy FORCED
16:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2040.codfw.wmnet with OS bullseye
16:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-sd2004.mgmt.codfw.wmnet with reboot policy FORCED
16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding logging-sd2004 to codfw - jhancock@cumin2002"
16:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding logging-sd2004 to codfw - jhancock@cumin2002"
16:05 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
16:05 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
15:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2040.codfw.wmnet with reason: host reimage
15:53 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2040.codfw.wmnet with reason: host reimage
15:52 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
15:52 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
15:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
15:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
15:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host <spicerack.netbox.NetboxServer object at 0x7fb7528f2580>
15:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2040
15:35 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2040
15:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2040.codfw.wmnet 161.0.192.10.in-addr.arpa 1.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:34 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker2040.codfw.wmnet 161.0.192.10.in-addr.arpa 1.6.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2040 - cgoubert@cumin1002"
15:34 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2040 - cgoubert@cumin1002"
15:28 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:28 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7fb7528f2580>
15:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2040.codfw.wmnet with OS bullseye
15:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1016.eqiad.wmnet with OS bookworm
15:24 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1013.eqiad.wmnet,service=s1
15:24 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1013.eqiad.wmnet,service=s3
15:24 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1013.eqiad.wmnet with OS bookworm
15:23 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
15:22 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
15:21 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
15:21 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
15:17 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2040.codfw.wmnet with OS bullseye
15:17 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.move-vlan (exit_code=99) for host <spicerack.netbox.NetboxServer object at 0x7fcd02d21d60>
15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:15 cgoubert@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7fcd02d21d60>
15:15 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2040.codfw.wmnet with OS bullseye
15:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2291 to wikikube-worker2040
15:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2040
15:13 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2040
15:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2291 to wikikube-worker2040 - cgoubert@cumin1002"
15:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:12 brennen@deploy1003: Finished deploy [phabricator/deployment@89f5014]: deploy phab1004 for T372898 (duration: 00m 31s)
15:11 brennen@deploy1003: Started deploy [phabricator/deployment@89f5014]: deploy phab1004 for T372898
15:11 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2291 to wikikube-worker2040 - cgoubert@cumin1002"
15:11 XioNoX: deploy pfw policy update 1724083328 - T372792
15:10 brennen@deploy1003: Finished deploy [phabricator/deployment@89f5014]: deploy phab2002 for T372898 (test redux) (duration: 01m 22s)
15:09 brennen@deploy1003: Started deploy [phabricator/deployment@89f5014]: deploy phab2002 for T372898 (test redux)
15:07 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:07 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2291 to wikikube-worker2040
15:05 brennen@deploy1003: Finished deploy [phabricator/deployment@89f5014]: deploy phab2002 for T372898 (duration: 00m 33s)
15:04 brennen@deploy1003: Started deploy [phabricator/deployment@89f5014]: deploy phab2002 for T372898
15:04 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
15:04 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
15:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2291.codfw.wmnet
15:04 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
15:04 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
15:03 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host mw2291.codfw.wmnet
15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
15:00 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
14:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2035.codfw.wmnet with OS bookworm
14:45 claime: Depooling mw2291.codfw.wmnet for rename and ip renumbering - T372878
14:43 klausman@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
14:42 klausman@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
14:38 klausman@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
14:37 klausman@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
14:37 klausman@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
14:32 klausman@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
14:31 klausman@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
14:30 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:29 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:24 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1013.eqiad.wmnet with reason: host reimage
14:22 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1013.eqiad.wmnet with reason: host reimage
14:22 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:22 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:03 mforns@deploy1003: Finished deploy [airflow-dags/analytics@c202679]: (no justification provided) (duration: 00m 51s)
14:02 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:02 mforns@deploy1003: Started deploy [airflow-dags/analytics@c202679]: (no justification provided)
14:02 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
13:59 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet,service=s7
13:59 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet,service=s2
13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
13:59 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1014.eqiad.wmnet with OS bookworm
13:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
13:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
13:54 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
13:53 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
13:51 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
13:50 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
13:31 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1014.eqiad.wmnet with reason: host reimage
13:28 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1014.eqiad.wmnet with reason: host reimage
13:15 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1014.eqiad.wmnet with OS bookworm
13:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1299.eqiad.wmnet with OS bullseye
13:12 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
13:06 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1014.eqiad.wmnet with reason: Reimaging clouddb1014 T365424
13:06 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb1014.eqiad.wmnet with reason: Reimaging clouddb1014 T365424
13:05 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1014.eqiad.wmnet,service=s2
13:05 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1014.eqiad.wmnet,service=s7
12:59 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
12:45 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:40 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:38 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:37 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:37 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:37 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
12:34 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:33 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:31 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:29 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
12:28 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
12:27 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:26 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
12:26 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
12:25 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:19 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Fix DeletedContributions for user names containing spaces (T372444), Allow ContributionsSpecialPage to accept usemodwiki IP addresses (T370413), Allow ContributionsSpecialPage to accept usemodwiki IP addresses (T370413) (duration: 11m 38s)
12:15 dreamyjazz@deploy1003: dreamyjazz, samtar: Continuing with sync
12:12 dreamyjazz@deploy1003: dreamyjazz, samtar: Backport for Fix DeletedContributions for user names containing spaces (T372444), Allow ContributionsSpecialPage to accept usemodwiki IP addresses (T370413), Allow ContributionsSpecialPage to accept usemodwiki IP addresses (T370413) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:08 dreamyjazz@deploy1003: Started scap sync-world: Backport for Fix DeletedContributions for user names containing spaces (T372444), Allow ContributionsSpecialPage to accept usemodwiki IP addresses (T370413), Allow ContributionsSpecialPage to accept usemodwiki IP addresses (T370413)
09:42 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
09:42 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
09:42 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
09:41 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
09:41 cgoubert@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:40 cgoubert@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:40 cgoubert@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
09:39 cgoubert@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
09:39 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:37 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
09:36 claime: Deploying calico configuration for codfw row c/d lsw - 1062728
09:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
09:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
08:15 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.19 refs T366964
08:15 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
08:04 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
07:25 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-main journal) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1022.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
07:18 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Update Netbox wheels - ayounsi@cumin1002 - T371890
07:14 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Update Netbox wheels - ayounsi@cumin1002 - T371890
06:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Update Netbox-next wheels - ayounsi@cumin1002 - T371890
06:47 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Update Netbox-next wheels - ayounsi@cumin1002 - T371890
06:43 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 18:00:00 on wdqs[2021-2023,2025].codfw.wmnet with reason: T364368 non-prod hosts
06:43 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 18:00:00 on wdqs[2021-2023,2025].codfw.wmnet with reason: T364368 non-prod hosts
06:43 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 05s)
06:42 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
06:40 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1022.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
06:36 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 13s)
06:36 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
05:22 marostegui: Deploy schema change on s1 eqiad old master db1184 dbmaint T367856
05:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1184 T372524', diff saved to https://phabricator.wikimedia.org/P67395 and previous config saved to /var/cache/conftool/dbconfig/20240820-051948-marostegui.json
05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1163 to s1 primary and set section read-write T372524', diff saved to https://phabricator.wikimedia.org/P67394 and previous config saved to /var/cache/conftool/dbconfig/20240820-051843-marostegui.json
05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T372524', diff saved to https://phabricator.wikimedia.org/P67393 and previous config saved to /var/cache/conftool/dbconfig/20240820-051821-root.json
05:18 marostegui: Starting s1 eqiad failover from db1184 to db1163 - T372524
05:17 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1163 with weight 0 T372524', diff saved to https://phabricator.wikimedia.org/P67392 and previous config saved to /var/cache/conftool/dbconfig/20240820-051726-marostegui.json
05:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1184.eqiad.wmnet with reason: Long schema change
05:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1184.eqiad.wmnet with reason: Long schema change
04:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T372524
04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1163 with weight 0 T372524', diff saved to https://phabricator.wikimedia.org/P67391 and previous config saved to /var/cache/conftool/dbconfig/20240820-045212-root.json
04:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T372524
04:00 mwpresync@deploy1003: Pruned MediaWiki: 1.43.0-wmf.16 (duration: 00m 56s)
03:49 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.43.0-wmf.19 refs T366964 (duration: 46m 32s)
03:02 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.43.0-wmf.19 refs T366964
00:21 mutante: previous message about prometheus can be ignored - race condition that solved itself on next puppet run
00:04 mutante: prometheus3003/prometheus1006 - are trying to use puppetserver1002 but get connection refused from puppetservre1001.eqiad.wmnet port 8140 - causing other puppet errors

2024-08-19

23:59 mutante: prometheus - puppet on prometheus hosts very slow - reason appears to be that /srv/prometheus is recursively managed by puppet but has ~ 20x more files than the default soft limit of 1000
23:55 mutante: prometheus - switched ferm::service to firewall::service (gerrit:1057952) - NOOP except /etc/ferm/conf.d/10_prometheus-web becomes /etc/ferm/conf.d/10_prometheus_web with identical rules
23:15 ejegg: fundraising civicrm upgraded from fd01c939 to 1022abf1
22:30 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1041.eqiad.wmnet with OS bullseye
22:12 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1041.eqiad.wmnet with reason: host reimage
22:09 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1041.eqiad.wmnet with reason: host reimage
21:50 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1041.eqiad.wmnet with OS bullseye
21:48 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1040.eqiad.wmnet with OS bullseye
21:30 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1040.eqiad.wmnet with reason: host reimage
21:26 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1040.eqiad.wmnet with reason: host reimage
21:07 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
21:06 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1039.eqiad.wmnet with OS bullseye
20:57 eevans@deploy1003: Finished deploy [restbase/deploy@b504108] (beta): Dry run beta deployment test (duration: 00m 06s)
20:57 eevans@deploy1003: Started deploy [restbase/deploy@b504108] (beta): Dry run beta deployment test
20:52 sbassett: Deployed changes from T372570 to security.wikimedia.org (miscweb)
20:49 sbassett@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
20:49 sbassett@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
20:49 sbassett@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
20:49 sbassett@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
20:49 sbassett@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
20:48 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1039.eqiad.wmnet with reason: host reimage
20:46 sbassett@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
20:45 eevans@deploy1003: Finished deploy [restbase/deploy@b504108] (beta): Dry run beta deployment test (duration: 00m 32s)
20:45 sbassett@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
20:45 eevans@deploy1003: Started deploy [restbase/deploy@b504108] (beta): Dry run beta deployment test
20:44 mforns@deploy1003: Finished deploy [airflow-dags/analytics_test@3ec5119]: (no justification provided) (duration: 00m 11s)
20:44 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1039.eqiad.wmnet with reason: host reimage
20:44 mforns@deploy1003: Started deploy [airflow-dags/analytics_test@3ec5119]: (no justification provided)
20:42 sbassett@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
20:26 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
20:26 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1039.eqiad.wmnet with OS bullseye
20:00 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
19:59 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1039.eqiad.wmnet with OS bullseye
19:54 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2024.codfw.wmnet with OS bullseye
19:53 dancy@deploy1003: Started scap sync-world: testing T371904
19:52 dancy@deploy1003: Installation of scap version "4.98.0" completed for 207 hosts
19:52 dancy@deploy1003: Installing scap version "4.98.0" for 207 hosts
19:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
19:45 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
19:45 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1039.eqiad.wmnet with OS bullseye
19:29 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
19:29 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1039.eqiad.wmnet with OS bullseye
19:28 sbassett@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
19:28 sbassett@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
19:07 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2024.codfw.wmnet with OS bullseye
19:06 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
19:04 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1039.eqiad.wmnet with OS bullseye
18:37 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
18:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
18:29 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
18:22 ejegg: fundraising civicrm upgraded from 56521963 to fd01c939
18:19 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
18:13 mforns@deploy1003: Finished deploy [analytics/refinery@9eaecec] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9eaecec7] (duration: 03m 24s)
18:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1299.eqiad.wmnet with reason: host reimage
18:12 Lucas_WMDE: FINISHED lucaswerkmeister-wmde@mwmaint1002:~$ foreachwiki maintenance/cleanupTitles.php --prefix=T195546 --reporting-interval=1000000000 2>&1 | tee ~/T195546.log
18:10 mforns@deploy1003: Started deploy [analytics/refinery@9eaecec] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9eaecec7]
18:09 mforns@deploy1003: Finished deploy [analytics/refinery@9eaecec] (thin): Regular analytics weekly train THIN [analytics/refinery@9eaecec7] (duration: 04m 25s)
18:09 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1299.eqiad.wmnet with reason: host reimage
18:05 mforns@deploy1003: Started deploy [analytics/refinery@9eaecec] (thin): Regular analytics weekly train THIN [analytics/refinery@9eaecec7]
17:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
17:55 mforns@deploy1003: Finished deploy [analytics/refinery@9eaecec]: Regular analytics weekly train [analytics/refinery@9eaecec7] (duration: 12m 30s)
17:53 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
17:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1299.eqiad.wmnet with OS bullseye
17:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1286.eqiad.wmnet with OS bullseye
17:50 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1285.eqiad.wmnet with OS bullseye
17:50 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:45 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:44 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2023.codfw.wmnet with OS bullseye
17:44 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2025.codfw.wmnet with OS bullseye
17:44 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2022.codfw.wmnet with OS bullseye
17:42 mforns@deploy1003: Started deploy [analytics/refinery@9eaecec]: Regular analytics weekly train [analytics/refinery@9eaecec7]
17:40 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:38 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
17:38 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
17:36 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
17:36 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
17:33 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1024.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
17:29 swfrench-wmf: statsd-exporter resource bumps (https://gerrit.wikimedia.org/r/1061856) are now everywhere - T371885
17:27 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
17:27 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
17:27 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
17:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1285.eqiad.wmnet with reason: host reimage
17:27 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-web: apply
17:27 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
17:26 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
17:26 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
17:26 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
17:26 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
17:26 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
17:26 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
17:25 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
17:25 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
17:25 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
17:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1286.eqiad.wmnet with reason: host reimage
17:20 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1285.eqiad.wmnet with reason: host reimage
17:19 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1286.eqiad.wmnet with reason: host reimage
17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-web: apply
17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-misc: apply
17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
17:13 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
17:11 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2024.codfw.wmnet with OS bullseye
17:09 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
17:09 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
17:08 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
17:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-sd2002.codfw.wmnet with OS bookworm
17:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1285.eqiad.wmnet with OS bullseye
17:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1286.eqiad.wmnet with OS bullseye
16:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1304.eqiad.wmnet with OS bullseye
16:57 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1303.eqiad.wmnet with OS bullseye
16:57 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1302.eqiad.wmnet with OS bullseye
16:57 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1301.eqiad.wmnet with OS bullseye
16:57 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1300.eqiad.wmnet with OS bullseye
16:57 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:47 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1036.eqiad.wmnet with OS bullseye
16:42 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1024.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
16:42 ryankemper@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1024.eqiad.wmnet, repooling neither afterwards
16:41 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1024.eqiad.wmnet, repooling neither afterwards
16:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-sd2002.codfw.wmnet with reason: host reimage
16:38 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1024.eqiad.wmnet, repooling neither afterwards
16:37 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-sd2002.codfw.wmnet with reason: host reimage
16:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet [reason: [done] T372160]
16:28 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
16:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for Reduce rate-limit for trusted editors of commons to 1500 every 3m (T370304) (duration: 06m 33s)
16:25 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
16:23 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2025.codfw.wmnet with OS bullseye
16:23 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2024.codfw.wmnet with OS bullseye
16:23 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2023.codfw.wmnet with OS bullseye
16:23 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2022.codfw.wmnet with OS bullseye
16:21 ladsgroup@deploy1003: ladsgroup: Continuing with sync
16:21 ladsgroup@deploy1003: ladsgroup: Backport for Reduce rate-limit for trusted editors of commons to 1500 every 3m (T370304) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-sd2001.codfw.wmnet with OS bookworm
16:20 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-sd2003.codfw.wmnet with OS bookworm
16:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:19 ladsgroup@deploy1003: Started scap sync-world: Backport for Reduce rate-limit for trusted editors of commons to 1500 every 3m (T370304)
16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:07 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1036.eqiad.wmnet with OS bullseye
15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-sd2001.codfw.wmnet with reason: host reimage
15:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-sd2003.codfw.wmnet with reason: host reimage
15:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-sd2001.codfw.wmnet with reason: host reimage
15:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-sd2003.codfw.wmnet with reason: host reimage
15:39 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2035.codfw.wmnet [reason: T372160]
15:36 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp6016*} and A:cp for 9.2.5-1wm2
15:32 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp6016*} and A:cp for 9.2.5-1wm2
15:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host logging-sd2003.codfw.wmnet with OS bookworm
15:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host logging-sd2002.codfw.wmnet with OS bookworm
15:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host logging-sd2001.codfw.wmnet with OS bookworm
15:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logging-sd2003']
15:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-sd2003']
15:27 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logging-sd2002']
15:25 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:25 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-sd2002']
15:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-sd2002.mgmt.codfw.wmnet with reboot policy FORCED
15:24 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:19 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:16 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:13 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:05 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1301.eqiad.wmnet with reason: host reimage
15:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logging-sd2001']
15:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logging-sd2003']
15:00 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1302.eqiad.wmnet with reason: host reimage
14:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1304.eqiad.wmnet with reason: host reimage
14:55 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-sd2003']
14:55 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-sd2001']
14:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1303.eqiad.wmnet with reason: host reimage
14:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1300.eqiad.wmnet with reason: host reimage
14:50 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1301.eqiad.wmnet with reason: host reimage
14:50 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1302.eqiad.wmnet with reason: host reimage
14:50 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1303.eqiad.wmnet with reason: host reimage
14:49 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1304.eqiad.wmnet with reason: host reimage
14:49 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1300.eqiad.wmnet with reason: host reimage
14:37 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:37 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:33 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
14:32 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
14:32 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
14:32 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
14:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1304.eqiad.wmnet with OS bullseye
14:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1302.eqiad.wmnet with OS bullseye
14:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1303.eqiad.wmnet with OS bullseye
14:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1301.eqiad.wmnet with OS bullseye
14:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1300.eqiad.wmnet with OS bullseye
14:30 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:30 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:29 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1303.eqiad.wmnet with OS bullseye
14:29 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1301.eqiad.wmnet with OS bullseye
14:29 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1300.eqiad.wmnet with OS bullseye
14:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1303.eqiad.wmnet with OS bullseye
14:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1301.eqiad.wmnet with OS bullseye
14:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1300.eqiad.wmnet with OS bullseye
14:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on wdqs[1022,1024].eqiad.wmnet with reason: noisy alerts, will look at later in the day
14:27 bking@cumin2002: START - Cookbook sre.hosts.downtime for 5:00:00 on wdqs[1022,1024].eqiad.wmnet with reason: noisy alerts, will look at later in the day
13:34 Lucas_WMDE: UTC afternoon backport+config window done (except for the T195546 maintenance script which is expected to keep running for a few more hours, currently at commonswiki)
13:31 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:31 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:27 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for (de|uk|ja|he|fi)wiki: enable shellbox-video (T356241) (duration: 06m 57s)
13:23 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet,service=s4
13:23 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet,service=s6
13:22 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, hnowlan: Continuing with sync
13:22 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, hnowlan: Backport for (de|uk|ja|he|fi)wiki: enable shellbox-video (T356241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:20 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for (de|uk|ja|he|fi)wiki: enable shellbox-video (T356241)
13:17 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for Define wgVirtualDomainsMapping for virtual-checkuser-global (T371724) (duration: 10m 23s)
13:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T367856)', diff saved to https://phabricator.wikimedia.org/P67386 and previous config saved to /var/cache/conftool/dbconfig/20240819-131702-marostegui.json
13:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 7:00:00 on db2162.codfw.wmnet with reason: Maintenance
13:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 7:00:00 on db2162.codfw.wmnet with reason: Maintenance
13:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T367856)', diff saved to https://phabricator.wikimedia.org/P67385 and previous config saved to /var/cache/conftool/dbconfig/20240819-131640-marostegui.json
13:16 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1015.eqiad.wmnet with OS bookworm
13:13 logmsgbot: lucaswerkmeister-wmde@deploy1003 dreamyjazz, lucaswerkmeister-wmde: Continuing with sync
13:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:10 logmsgbot: lucaswerkmeister-wmde@deploy1003 dreamyjazz, lucaswerkmeister-wmde: Backport for Define wgVirtualDomainsMapping for virtual-checkuser-global (T371724) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:10 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
13:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "rdb1014 back to active - cgoubert@cumin1002 - T370633"
13:09 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "rdb1014 back to active - cgoubert@cumin1002 - T370633"
13:07 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for Define wgVirtualDomainsMapping for virtual-checkuser-global (T371724)
13:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:02 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint1002:~$ foreachwiki maintenance/cleanupTitles.php --prefix=T195546 --reporting-interval=1000000000 2>&1 | tee ~/T195546.log
13:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P67384 and previous config saved to /var/cache/conftool/dbconfig/20240819-130132-marostegui.json
13:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:57 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
12:49 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1015.eqiad.wmnet with reason: host reimage
12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P67383 and previous config saved to /var/cache/conftool/dbconfig/20240819-124625-marostegui.json
12:45 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1015.eqiad.wmnet with reason: host reimage
12:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:39 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:39 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
12:38 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:38 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:37 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:37 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:37 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:33 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1015.eqiad.wmnet with OS bookworm
12:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T367856)', diff saved to https://phabricator.wikimedia.org/P67382 and previous config saved to /var/cache/conftool/dbconfig/20240819-123119-marostegui.json
12:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:27 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1015.eqiad.wmnet with reason: Reimaging clouddb1015 T365424
12:27 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb1015.eqiad.wmnet with reason: Reimaging clouddb1015 T365424
12:26 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet,service=s6
12:26 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet,service=s4
12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Enable temporary accounts on test2wiki (T371116) (duration: 22m 14s)
12:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:18 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:18 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
12:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:16 dreamyjazz@deploy1003: dreamyjazz: Backport for Enable temporary accounts on test2wiki (T371116) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:15 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:11 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:03 kevinbazira@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
12:03 dreamyjazz@deploy1003: Started scap sync-world: Backport for Enable temporary accounts on test2wiki (T371116)
12:01 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
11:56 Dreamy_Jazz: Started scanning script for ruwiki with timeout of 6h to catchup to monthly request limit
11:49 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
11:30 kevinbazira@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
11:27 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
10:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
10:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
10:29 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
10:14 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
10:10 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
10:10 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
10:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67378 and previous config saved to /var/cache/conftool/dbconfig/20240819-100847-root.json
09:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67377 and previous config saved to /var/cache/conftool/dbconfig/20240819-095342-root.json
09:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67376 and previous config saved to /var/cache/conftool/dbconfig/20240819-093836-root.json
09:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67375 and previous config saved to /var/cache/conftool/dbconfig/20240819-092331-root.json
09:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:16 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67374 and previous config saved to /var/cache/conftool/dbconfig/20240819-090825-root.json
09:07 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
09:06 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
08:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67373 and previous config saved to /var/cache/conftool/dbconfig/20240819-085320-root.json
08:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67372 and previous config saved to /var/cache/conftool/dbconfig/20240819-083814-root.json
08:35 marostegui: Upgrade db2136 to 10.11.9 T372551
08:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2136.codfw.wmnet with reason: Upgrade to 10.11.9
08:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2136.codfw.wmnet with reason: Upgrade to 10.11.9
08:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136', diff saved to https://phabricator.wikimedia.org/P67371 and previous config saved to /var/cache/conftool/dbconfig/20240819-083439-root.json
08:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
08:32 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
08:32 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
08:31 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
08:18 brouberol@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:18 brouberol@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding AAAA field to snapshot1010 and dumpsdata1003 - brouberol@cumin1002"
08:18 brouberol@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding AAAA field to snapshot1010 and dumpsdata1003 - brouberol@cumin1002"
08:14 brouberol@cumin1002: START - Cookbook sre.dns.netbox
07:25 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
07:25 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
07:25 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
07:24 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
07:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
07:16 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
07:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
07:14 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
07:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67370 and previous config saved to /var/cache/conftool/dbconfig/20240819-070034-root.json
06:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67369 and previous config saved to /var/cache/conftool/dbconfig/20240819-064528-root.json
06:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67368 and previous config saved to /var/cache/conftool/dbconfig/20240819-063023-root.json
06:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67367 and previous config saved to /var/cache/conftool/dbconfig/20240819-061517-root.json
06:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67366 and previous config saved to /var/cache/conftool/dbconfig/20240819-060011-root.json
05:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67365 and previous config saved to /var/cache/conftool/dbconfig/20240819-054506-root.json
05:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67364 and previous config saved to /var/cache/conftool/dbconfig/20240819-053000-root.json
05:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1195.eqiad.wmnet with reason: Upgrade to 10.6.19
05:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1195.eqiad.wmnet with reason: Upgrade to 10.6.19
05:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1195 T372536', diff saved to https://phabricator.wikimedia.org/P67363 and previous config saved to /var/cache/conftool/dbconfig/20240819-052352-root.json

2024-08-18

22:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Setting back s4 to RW', diff saved to https://phabricator.wikimedia.org/P67362 and previous config saved to /var/cache/conftool/dbconfig/20240818-220355-ladsgroup.json
22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Set s4 as read-only ', diff saved to https://phabricator.wikimedia.org/P67361 and previous config saved to /var/cache/conftool/dbconfig/20240818-220043-ladsgroup.json
20:54 kamila@cumin2002: dbctl commit (dc=all): 'Setting s4 back to RW', diff saved to https://phabricator.wikimedia.org/P67360 and previous config saved to /var/cache/conftool/dbconfig/20240818-205410-kamila.json
20:50 kamila@cumin2002: dbctl commit (dc=all): 'Set s4 as read-only due to overload', diff saved to https://phabricator.wikimedia.org/P67359 and previous config saved to /var/cache/conftool/dbconfig/20240818-205024-kamila.json

2024-08-17

11:33 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67358 and previous config saved to /var/cache/conftool/dbconfig/20240817-113358-root.json
11:18 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67357 and previous config saved to /var/cache/conftool/dbconfig/20240817-111852-root.json
11:03 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67356 and previous config saved to /var/cache/conftool/dbconfig/20240817-110347-root.json
10:48 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67355 and previous config saved to /var/cache/conftool/dbconfig/20240817-104841-root.json
10:33 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67354 and previous config saved to /var/cache/conftool/dbconfig/20240817-103336-root.json
10:18 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67353 and previous config saved to /var/cache/conftool/dbconfig/20240817-101831-root.json
10:03 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67352 and previous config saved to /var/cache/conftool/dbconfig/20240817-100325-root.json
09:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T367856)', diff saved to https://phabricator.wikimedia.org/P67351 and previous config saved to /var/cache/conftool/dbconfig/20240817-095320-marostegui.json
09:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 7:00:00 on db2154.codfw.wmnet with reason: Maintenance
09:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 7:00:00 on db2154.codfw.wmnet with reason: Maintenance
09:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 7:00:00 on db2152.codfw.wmnet with reason: Maintenance
09:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 7:00:00 on db2152.codfw.wmnet with reason: Maintenance

2024-08-16

23:27 eevans@deploy1003: deploy aborted: Test deploy (duration: 00m 25s)
23:27 eevans@deploy1003: Started deploy [cassandra/logstash-logback-encoder@42653e6] (beta): Test deploy
20:26 eevans@deploy1003: Finished deploy [cassandra/logstash-logback-encoder@42653e6] (aqs): Test (duration: 00m 32s)
20:26 eevans@deploy1003: Started deploy [cassandra/logstash-logback-encoder@42653e6] (aqs): Test
20:15 eevans@deploy1003: Finished deploy [cassandra/logstash-logback-encoder@42653e6] (beta): Beta deploy (duration: 00m 31s)
20:14 eevans@deploy1003: Started deploy [cassandra/logstash-logback-encoder@42653e6] (beta): Beta deploy
20:12 eevans@deploy1003: Finished deploy [restbase/deploy@f696b76] (beta): deploy to beta (duration: 01m 05s)
20:11 eevans@deploy1003: Started deploy [restbase/deploy@f696b76] (beta): deploy to beta
20:11 eevans@deploy1003: deploy aborted: deploy to beta (duration: 00m 28s)
20:10 eevans@deploy1003: Started deploy [restbase/deploy@f696b76] (beta): deploy to beta
20:04 eevans@deploy1003: deploy aborted: (no justification provided) (duration: 00m 11s)
20:04 eevans@deploy1003: Started deploy [restbase/deploy@f696b76] (beta): (no justification provided)
20:01 eevans@deploy1003: deploy aborted: (no justification provided) (duration: 00m 20s)
20:01 eevans@deploy1003: Started deploy [restbase/deploy@f696b76] (beta): (no justification provided)
19:59 eevans@deploy1003: Finished deploy [restbase/deploy@f696b76] (beta): (no justification provided) (duration: 00m 33s)
19:59 eevans@deploy1003: Started deploy [restbase/deploy@f696b76] (beta): (no justification provided)
17:18 sbassett@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
17:18 sbassett@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
15:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-sd2003.mgmt.codfw.wmnet with reboot policy FORCED
15:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-sd2001.mgmt.codfw.wmnet with reboot policy FORCED
15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-sd2003.mgmt.codfw.wmnet with reboot policy FORCED
15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-sd2002.mgmt.codfw.wmnet with reboot policy FORCED
15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-sd2001.mgmt.codfw.wmnet with reboot policy FORCED
15:30 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding logging-sd2004 to codfw - jhancock@cumin2002"
15:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding logging-sd2004 to codfw - jhancock@cumin2002"
15:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:12 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
15:11 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
15:10 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
15:10 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
15:08 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2043
15:08 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2043
15:08 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:08 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2043 to codfw - jhancock@cumin2002"
15:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2043 to codfw - jhancock@cumin2002"
15:06 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
15:05 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
15:04 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2042
15:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2042
15:04 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2042 to codfw - jhancock@cumin2002"
15:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2042 to codfw - jhancock@cumin2002"
15:00 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2041
14:59 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2041
14:59 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2041 to codfw - jhancock@cumin2002"
14:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2041 to codfw - jhancock@cumin2002"
14:54 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:52 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2040
14:52 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2040
14:52 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
14:51 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2040 to codfw - jhancock@cumin2002"
14:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2040 to codfw - jhancock@cumin2002"
14:49 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s3
14:48 fnegri@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddb1017.eqiad.wmnet with OS bookworm
14:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:46 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2039
14:46 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2039
14:46 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2039 to codfw - jhancock@cumin2002"
14:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2039 to codfw - jhancock@cumin2002"
14:42 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:36 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2038
14:35 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2038
14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2038 to codfw - jhancock@cumin2002"
14:35 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
14:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2038 to codfw - jhancock@cumin2002"
14:34 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
14:31 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:27 isaranto@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
14:26 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2037
14:26 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2037
14:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2037 to codfw - jhancock@cumin2002"
14:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2037 to codfw - jhancock@cumin2002"
14:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
14:21 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
14:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with reboot policy FORCED
14:21 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti2037
14:16 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2037
14:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:03 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1035.eqiad.wmnet with OS bullseye
14:00 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2036
13:59 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2036
13:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2035
13:59 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2035
13:59 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2036 to codfw - jhancock@cumin2002"
13:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2036 to codfw - jhancock@cumin2002"
13:55 jhancock@cumin2002: START - Cookbook sre.dns.netbox
13:51 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1017.eqiad.wmnet with OS bookworm
13:48 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
13:45 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
13:43 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1017.eqiad.wmnet with reason: Reimaging clouddb1017 T365424
13:43 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb1017.eqiad.wmnet with reason: Reimaging clouddb1017 T365424
13:41 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s3
13:41 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
13:26 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bullseye
12:49 isaranto@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
11:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
11:21 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
10:21 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1007.eqiad.wmnet with OS bullseye
10:19 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1009.eqiad.wmnet with OS bullseye
10:16 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1010.eqiad.wmnet with OS bullseye
10:14 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1008.eqiad.wmnet with OS bullseye
10:10 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1006.eqiad.wmnet with OS bullseye
10:05 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1007.eqiad.wmnet with reason: host reimage
10:02 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1009.eqiad.wmnet with reason: host reimage
09:58 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1010.eqiad.wmnet with reason: host reimage
09:58 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
09:57 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
09:56 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1008.eqiad.wmnet with reason: host reimage
09:53 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1010.eqiad.wmnet with reason: host reimage
09:53 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1006.eqiad.wmnet with reason: host reimage
09:51 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1009.eqiad.wmnet with reason: host reimage
09:51 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1008.eqiad.wmnet with reason: host reimage
09:51 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1007.eqiad.wmnet with reason: host reimage
09:50 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1006.eqiad.wmnet with reason: host reimage
09:50 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
09:46 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: sync
09:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
09:43 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
09:35 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1010.eqiad.wmnet with OS bullseye
09:35 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1009.eqiad.wmnet with OS bullseye
09:34 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1008.eqiad.wmnet with OS bullseye
09:34 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1007.eqiad.wmnet with OS bullseye
09:33 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1006.eqiad.wmnet with OS bullseye
09:30 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
09:29 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
09:23 pfischer@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:23 pfischer@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
08:52 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1010.eqiad.wmnet with OS bullseye
08:50 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1009.eqiad.wmnet with OS bullseye
08:49 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1008.eqiad.wmnet with OS bullseye
08:48 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1007.eqiad.wmnet with OS bullseye
08:47 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1006.eqiad.wmnet with OS bullseye
08:20 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:20 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
08:05 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1010.eqiad.wmnet with OS bullseye
08:03 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1009.eqiad.wmnet with OS bullseye
08:02 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1008.eqiad.wmnet with OS bullseye
08:01 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1007.eqiad.wmnet with OS bullseye
08:00 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1006.eqiad.wmnet with OS bullseye
07:43 XioNoX: deploy pfw policy update 1723675086 - T372520
07:40 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2007.codfw.wmnet with OS bullseye
07:23 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2007.codfw.wmnet with reason: host reimage
07:20 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2007.codfw.wmnet with reason: host reimage
07:01 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2007.codfw.wmnet with OS bullseye
06:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repool db2136 - running 10.11', diff saved to https://phabricator.wikimedia.org/P67345 and previous config saved to /var/cache/conftool/dbconfig/20240816-065606-marostegui.json
06:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2152.codfw.wmnet with reason: Schema change
06:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2152.codfw.wmnet with reason: Schema change

2024-08-15

23:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
23:10 xSavitar: T372449 mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Philip Federici' 'FilippoFederici' --ignorestatus
22:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards
22:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
22:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
22:02 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with reboot policy FORCED
21:54 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards
21:53 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards
21:53 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards
21:47 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:46 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with reboot policy FORCED
21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
21:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
21:37 jhancock@cumin2002: START - Cookbook sre.dns.netbox
21:01 ebernhardson: backport window complete
20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2035 to codfw - jhancock@cumin2002"
20:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2035 to codfw - jhancock@cumin2002"
20:54 ejegg: fundraising civicrm upgraded from eecbba5d to 56521963
20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
20:45 ebernhardson@deploy1003: Finished scap sync-world: Backport for cirrus: Stop general writes to private wikis (T341332) (duration: 08m 25s)
20:41 ebernhardson@deploy1003: ebernhardson: Continuing with sync
20:39 ebernhardson@deploy1003: ebernhardson: Backport for cirrus: Stop general writes to private wikis (T341332) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:37 ebernhardson@deploy1003: Started scap sync-world: Backport for cirrus: Stop general writes to private wikis (T341332)
20:30 ebernhardson@deploy1003: Finished scap sync-world: Backport for Revert "CommentFormatter: Switch from deprecated addJsConfigVars to new setJsConfigVar" (T372499) (duration: 10m 06s)
20:25 ebernhardson@deploy1003: ebernhardson, matmarex: Continuing with sync
20:23 ebernhardson@deploy1003: ebernhardson, matmarex: Backport for Revert "CommentFormatter: Switch from deprecated addJsConfigVars to new setJsConfigVar" (T372499) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:20 ebernhardson@deploy1003: Started scap sync-world: Backport for Revert "CommentFormatter: Switch from deprecated addJsConfigVars to new setJsConfigVar" (T372499)
away: running global rename cleanup script per T372006#10055573
18:15 jhuneidi@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.18 refs T366963
18:02 aokoth@cumin1002: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1001.eqiad.wmnet
18:00 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
17:45 dwisehaupt@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:44 dwisehaupt@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update mgmt dns for civi2002 frpig2002 - dwisehaupt@cumin1002"
17:44 dwisehaupt@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update mgmt dns for civi2002 frpig2002 - dwisehaupt@cumin1002"
17:41 dwisehaupt@cumin1002: START - Cookbook sre.dns.netbox
17:22 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site ulsfo [reason: testing done, T369366]
17:22 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site ulsfo [reason: testing done, T369366]
17:13 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2008.codfw.wmnet with OS bullseye
17:07 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site ulsfo [reason: testing live change, T369366]
17:07 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site ulsfo [reason: testing live change, T369366]
16:54 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2008.codfw.wmnet with reason: host reimage
16:53 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
16:52 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
16:52 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
16:51 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2008.codfw.wmnet with reason: host reimage
16:51 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
15:55 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2007.codfw.wmnet with OS bullseye
15:53 SandraEbele_: reran druid_load_geoeditors_monthly, cassandra_load_editors_by_country_monthly, and druid_load_edit_hourly airflow dags with run_id scheduled__2024-06-01T00:00:00+00:00 as part of down stream tasks after rerunning mediawiki_history_denormalize for 2024-06 snapshot.
15:52 sukhe: sudo cumin -b1 -s60 "A:dnsbox" "run-puppet-agent --enable 'merging CR 1053929 T369366'": T369366
15:51 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:48 sukhe@cumin1002: START - Cookbook sre.dns.netbox
15:45 sukhe: running authdns-update again
15:43 sukhe: running authdns-update
15:31 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:31 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:30 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:30 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
15:27 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2008.codfw.wmnet with OS bullseye
15:21 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: show site None [reason: no reason specified, no task ID specified]
15:21 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: show site None [reason: no reason specified, no task ID specified]
15:21 sukhe: running authdns-update
15:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: moving ahead with admin_state migration]
15:10 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site esams [reason: no reason specified, no task ID specified]
15:09 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site esams [reason: no reason specified, no task ID specified]
15:09 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: show site None [reason: no reason specified, no task ID specified]
15:09 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: show site None [reason: no reason specified, no task ID specified]
15:04 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: show site None [reason: no reason specified, no task ID specified]
15:03 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: show site None [reason: no reason specified, no task ID specified]
15:02 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site esams [reason: testing on dns4004, no task ID specified]
15:01 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site esams [reason: testing on dns4004, no task ID specified]
15:01 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site magru [reason: testing on dns4004, no task ID specified]
15:00 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site magru [reason: testing on dns4004, no task ID specified]
14:57 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
14:53 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
14:49 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
14:48 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
14:47 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
14:46 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
14:43 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
14:41 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
14:36 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad [reason: testing on dns4004, no task ID specified]
14:36 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad [reason: testing on dns4004, no task ID specified]
14:35 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
14:34 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad [reason: testing on dns4004, no task ID specified]
14:33 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad [reason: testing on dns4004, no task ID specified]
14:25 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2007.codfw.wmnet with OS bullseye
14:21 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:21 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
14:01 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site magru for service: text-addrs|text-next [reason: testing on dns4004, no task ID specified]
14:00 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site magru for service: text-addrs|text-next [reason: testing on dns4004, no task ID specified]
13:59 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad [reason: testing on dns4004, no task ID specified]
13:59 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad [reason: testing on dns4004, no task ID specified]
13:54 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns4004.wikimedia.org [reason: admin_state migration test]
13:54 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns4004.wikimedia.org,service=recdns [reason: admin_state migration test]
13:52 Lucas_WMDE: UTC afternoon backport+config window done
13:51 sukhe: sudo cumin "A:dnsbox" 'disable-puppet "merging CR 1053929 T369366"'
13:50 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for Save the request before starting the automatic vanish job (T372006) (duration: 34m 44s)
13:50 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
13:47 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
13:46 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
13:45 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
13:45 logmsgbot: lucaswerkmeister-wmde@deploy1003 seddon, lucaswerkmeister-wmde: Continuing with sync
13:44 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
13:43 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
13:42 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
13:40 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
13:40 logmsgbot: lucaswerkmeister-wmde@deploy1003 seddon, lucaswerkmeister-wmde: Backport for Save the request before starting the automatic vanish job (T372006) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:38 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
13:35 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
13:35 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
13:34 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
13:34 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
13:34 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
13:33 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
13:33 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
13:32 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
13:31 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
13:31 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
13:30 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
13:26 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
13:25 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
13:23 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
13:22 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
13:16 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for Save the request before starting the automatic vanish job (T372006)
12:52 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2009.codfw.wmnet with OS bullseye
12:49 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2010.codfw.wmnet with OS bullseye
12:34 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2009.codfw.wmnet with reason: host reimage
12:32 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2010.codfw.wmnet with reason: host reimage
12:29 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2009.codfw.wmnet with reason: host reimage
12:28 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2010.codfw.wmnet with reason: host reimage
12:26 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
12:26 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
12:25 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
12:23 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
12:10 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bullseye
12:09 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2009.codfw.wmnet with OS bullseye
11:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67341 and previous config saved to /var/cache/conftool/dbconfig/20240815-114213-root.json
11:27 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67340 and previous config saved to /var/cache/conftool/dbconfig/20240815-112707-root.json
11:24 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67339 and previous config saved to /var/cache/conftool/dbconfig/20240815-111201-root.json
11:04 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
11:00 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
10:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67338 and previous config saved to /var/cache/conftool/dbconfig/20240815-105656-root.json
10:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67337 and previous config saved to /var/cache/conftool/dbconfig/20240815-104150-root.json
10:36 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2006.codfw.wmnet with OS bullseye
10:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1125.eqiad.wmnet with reason: Upgrade to 10.6.19
10:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1125.eqiad.wmnet with reason: Upgrade to 10.6.19
10:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc1014.eqiad.wmnet with reason: Upgrade to 10.6.19
10:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on pc1014.eqiad.wmnet with reason: Upgrade to 10.6.19
10:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2014.codfw.wmnet with reason: Upgrade to 10.6.19
10:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on pc2014.codfw.wmnet with reason: Upgrade to 10.6.19
10:27 marostegui: Install 10.6.19 on pc1014 db1125 pc2014 T372536
10:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67336 and previous config saved to /var/cache/conftool/dbconfig/20240815-102645-root.json
10:21 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
10:19 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
10:18 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2006.codfw.wmnet with reason: host reimage
10:15 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2006.codfw.wmnet with reason: host reimage
10:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67335 and previous config saved to /var/cache/conftool/dbconfig/20240815-101139-root.json
09:55 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2006.codfw.wmnet with OS bullseye
09:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2152.codfw.wmnet with reason: Schema change
09:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2152.codfw.wmnet with reason: Schema change
09:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T367856)', diff saved to https://phabricator.wikimedia.org/P67334 and previous config saved to /var/cache/conftool/dbconfig/20240815-092502-marostegui.json
09:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2152.codfw.wmnet with reason: Maintenance
09:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2152.codfw.wmnet with reason: Maintenance
08:55 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2006.codfw.wmnet with OS bullseye
08:04 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2006.codfw.wmnet with OS bullseye
08:00 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2006.codfw.wmnet with OS bullseye
07:47 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2009.codfw.wmnet with OS bullseye
07:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 10:00:00 on 9 hosts with reason: T364368 non-prod hosts
07:31 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 10:00:00 on 9 hosts with reason: T364368 non-prod hosts
07:09 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2006.codfw.wmnet with OS bullseye
06:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67333 and previous config saved to /var/cache/conftool/dbconfig/20240815-063734-root.json
06:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67332 and previous config saved to /var/cache/conftool/dbconfig/20240815-062229-root.json
06:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67331 and previous config saved to /var/cache/conftool/dbconfig/20240815-060723-root.json
05:52 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67330 and previous config saved to /var/cache/conftool/dbconfig/20240815-055218-root.json
05:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67329 and previous config saved to /var/cache/conftool/dbconfig/20240815-053712-root.json
05:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67328 and previous config saved to /var/cache/conftool/dbconfig/20240815-052206-root.json
05:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67327 and previous config saved to /var/cache/conftool/dbconfig/20240815-050701-root.json
05:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1223 T372393', diff saved to https://phabricator.wikimedia.org/P67326 and previous config saved to /var/cache/conftool/dbconfig/20240815-050613-root.json
05:04 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1189 to s3 primary and set section read-write T372393', diff saved to https://phabricator.wikimedia.org/P67325 and previous config saved to /var/cache/conftool/dbconfig/20240815-050428-root.json
05:04 marostegui@cumin1002: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T372393', diff saved to https://phabricator.wikimedia.org/P67324 and previous config saved to /var/cache/conftool/dbconfig/20240815-050410-root.json
05:03 marostegui: Starting s3 eqiad failover from db1223 to db1189 - T372393
04:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Stop MariaDB on db1238 T371342
04:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Stop MariaDB on db1238 T371342
04:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s3 T372393
04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1189 with weight 0 T372393', diff saved to https://phabricator.wikimedia.org/P67323 and previous config saved to /var/cache/conftool/dbconfig/20240815-044929-root.json
04:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s3 T372393
03:26 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
03:26 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix mgmt DNS fro fd2004 - pt1979@cumin2002"
03:26 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix mgmt DNS fro fd2004 - pt1979@cumin2002"
03:22 pt1979@cumin2002: START - Cookbook sre.dns.netbox
02:24 milimetric@deploy1003: Finished deploy [airflow-dags/analytics@02f37cf]: (no justification provided) (duration: 00m 43s)
02:23 milimetric@deploy1003: Started deploy [airflow-dags/analytics@02f37cf]: (no justification provided)

2024-08-14

23:34 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:33 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
23:30 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:30 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
23:09 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:09 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:07 dwisehaupt@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:05 dwisehaupt@cumin1002: START - Cookbook sre.dns.netbox
22:56 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:56 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:52 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:51 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
22:50 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:50 dwisehaupt@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:50 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:48 dwisehaupt@cumin1002: START - Cookbook sre.dns.netbox
22:48 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:48 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:28 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:28 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
22:17 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:15 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:07 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:05 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
20:31 jhuneidi@deploy1003: Finished scap sync-world: Backport for Revert "Activates the "compact" Parsoid indicator on all wikivoyage wikis" (duration: 06m 40s)
20:26 jhuneidi@deploy1003: trainbranchbot, jhuneidi: Continuing with sync
20:26 jhuneidi@deploy1003: trainbranchbot, jhuneidi: Backport for Revert "Activates the "compact" Parsoid indicator on all wikivoyage wikis" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:24 jhuneidi@deploy1003: Started scap sync-world: Backport for Revert "Activates the "compact" Parsoid indicator on all wikivoyage wikis"
20:21 jhuneidi@deploy1003: Sync cancelled.
20:14 jhuneidi@deploy1003: ihurbain, jhuneidi: Backport for Activates the "compact" Parsoid indicator on all wikivoyage wikis synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:11 jhuneidi@deploy1003: Started scap sync-world: Backport for Activates the "compact" Parsoid indicator on all wikivoyage wikis
19:26 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@6d50458]: Test Refine through Airflow (duration: 00m 12s)
19:26 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@6d50458]: Test Refine through Airflow
18:14 jhuneidi@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.18 refs T366963
17:50 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2010.codfw.wmnet with OS bullseye
17:35 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site esams [reason: no reason specified, no task ID specified]
17:35 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site esams [reason: no reason specified, no task ID specified]
17:32 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site esams for service: text-addrs|text-next [reason: no reason specified, no task ID specified]
17:32 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site esams for service: text-addrs|text-next [reason: no reason specified, no task ID specified]
17:31 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: show site None [reason: no reason specified, no task ID specified]
17:31 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: show site None [reason: no reason specified, no task ID specified]
17:31 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site magru for service: text-addrs [reason: no reason specified, no task ID specified]
17:31 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site magru for service: text-addrs [reason: no reason specified, no task ID specified]
17:31 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: show site None [reason: no reason specified, no task ID specified]
17:31 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: show site None [reason: no reason specified, no task ID specified]
17:30 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad [reason: testing cookbook, T369366]
17:30 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad [reason: testing cookbook, T369366]
17:30 sukhe@cumin1002: END (FAIL) - Cookbook sre.dns.admin (exit_code=99) DNS admin: pool site eqiad [reason: no reason specified, no task ID specified]
17:30 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad [reason: no reason specified, no task ID specified]
17:30 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: show site None [reason: no reason specified, no task ID specified]
17:30 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: show site None [reason: no reason specified, no task ID specified]
17:17 otto@deploy1003: Finished deploy [airflow-dags/analytics_product@6d50458]: (no justification provided) (duration: 00m 08s)
17:17 otto@deploy1003: Started deploy [airflow-dags/analytics_product@6d50458]: (no justification provided)
17:16 SandraEbele_: reran geoeditors_public_monthly airflow dag with run_id scheduled__2024-06-01T00:00:00+00:00 as part of down stream tasks after rerunning mediawiki_history_denormalize for 2024-06 snapshot.
17:13 ladsgroup@deploy1003: Finished scap sync-world: Backport for Avoid primary DB query for non-talk page edits (T370304), Avoid primary DB query for non-talk page edits (T370304) (duration: 07m 54s)
17:12 SandraEbele_: reran geoeditors_monthly airflow dag with run_id scheduled__2024-06-01T00:00:00+00:00 as part of down stream tasks after rerunning mediawiki_history_denormalize for 2024-06 snapshot.
17:09 SandraEbele_: reran geoeditors_edits_monthly airflow dag with run_id scheduled__2024-06-01T00:00:00+00:00 as part of down stream tasks after rerunning mediawiki_history_denormalize for 2024-06 snapshot.
17:08 ladsgroup@deploy1003: ladsgroup: Continuing with sync
17:07 ladsgroup@deploy1003: ladsgroup: Backport for Avoid primary DB query for non-talk page edits (T370304), Avoid primary DB query for non-talk page edits (T370304) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:05 ladsgroup@deploy1003: Started scap sync-world: Backport for Avoid primary DB query for non-talk page edits (T370304), Avoid primary DB query for non-talk page edits (T370304)
16:59 otto@deploy1003: Finished deploy [analytics/refinery@f033576]: Regular analytics weekly train [analytics/refinery@f0335766] (duration: 06m 48s)
16:55 SandraEbele_: reran unique_editors_by_country_monthly airflow dag with run_id scheduled__2024-06-01T00:00:00+00:00 as part of down stream tasks after rerunning mediawiki_history_denormalize for 2024-06 snapshot.
16:52 SandraEbele_: reran edit_hourly airflow dag with run_id scheduled__2024-06-01T00:00:00+00:00 as part of down stream tasks after rerunning mediawiki_history_denormalize for 2024-06 snapshot.
16:52 otto@deploy1003: Started deploy [analytics/refinery@f033576]: Regular analytics weekly train [analytics/refinery@f0335766]
16:52 otto@deploy1003: Finished deploy [analytics/refinery@f033576] (thin): Regular analytics weekly train THIN [analytics/refinery@f0335766] (duration: 04m 13s)
16:48 SandraEbele_: reran editors_daily_monthly airflow dag with run_id scheduled__2024-06-01T00:00:00+00:00 as part of downstream tasks after rerunning mediawiki_history_denormalize dag
16:48 otto@deploy1003: Started deploy [analytics/refinery@f033576] (thin): Regular analytics weekly train THIN [analytics/refinery@f0335766]
16:45 otto@deploy1003: Finished deploy [analytics/refinery@f033576] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f0335766] (duration: 03m 06s)
16:45 ladsgroup@deploy1003: ladsgroup: Continuing with sync
16:43 ladsgroup@deploy1003: ladsgroup: Backport for Avoid primary DB query for non-talk page edits (T370304) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:42 otto@deploy1003: Started deploy [analytics/refinery@f033576] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f0335766]
16:41 ladsgroup@deploy1003: Started scap sync-world: Backport for Avoid primary DB query for non-talk page edits (T370304)
16:28 arnaudb@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67318 and previous config saved to /var/cache/conftool/dbconfig/20240814-162854-arnaudb.json
16:24 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2010.codfw.wmnet with OS bullseye
16:13 arnaudb@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67317 and previous config saved to /var/cache/conftool/dbconfig/20240814-161350-arnaudb.json
16:04 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
16:04 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
16:03 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
16:01 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2009.codfw.wmnet with OS bullseye
15:58 arnaudb@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67316 and previous config saved to /var/cache/conftool/dbconfig/20240814-155844-arnaudb.json
15:48 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:47 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:43 arnaudb@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67315 and previous config saved to /var/cache/conftool/dbconfig/20240814-154338-arnaudb.json
15:40 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
15:39 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
15:39 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
15:39 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
15:39 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
15:39 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
15:34 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bullseye
15:28 arnaudb@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 16%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67314 and previous config saved to /var/cache/conftool/dbconfig/20240814-152833-arnaudb.json
15:13 arnaudb@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 8%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67312 and previous config saved to /var/cache/conftool/dbconfig/20240814-151328-arnaudb.json
14:59 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2010.codfw.wmnet
14:58 arnaudb@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 4%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67307 and previous config saved to /var/cache/conftool/dbconfig/20240814-145819-arnaudb.json
14:53 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve2010.codfw.wmnet
14:49 jayme@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host kafka-main2010.codfw.wmnet with OS bookworm
14:43 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bookworm
14:43 arnaudb@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 2%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67305 and previous config saved to /var/cache/conftool/dbconfig/20240814-144314-arnaudb.json
14:32 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
14:28 arnaudb@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 1%: broken disk replaced, slow repooling', diff saved to https://phabricator.wikimedia.org/P67304 and previous config saved to /var/cache/conftool/dbconfig/20240814-142808-arnaudb.json
14:27 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: sync
14:22 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
14:21 arnaudb@cumin1002: dbctl commit (dc=all): 'es1 es1029 depooling for hdd hotswap', diff saved to https://phabricator.wikimedia.org/P67299 and previous config saved to /var/cache/conftool/dbconfig/20240814-142147-arnaudb.json
14:21 ebernhardson@deploy1003: Synchronized private/PrivateSettings.php: Update NetworkSession users list for T341332 (duration: 12m 33s)
14:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: sync
13:55 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: sync
13:55 elukey@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: sync
13:52 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
13:50 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: sync
13:33 kartik@deploy1003: Finished scap sync-world: Backport for Use the updated recommendation API from liftwing (T371465) (duration: 07m 51s)
13:32 jayme@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-main2010.codfw.wmnet']
13:29 kartik@deploy1003: kartik: Continuing with sync
13:28 kartik@deploy1003: kartik: Backport for Use the updated recommendation API from liftwing (T371465) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:25 kartik@deploy1003: Started scap sync-world: Backport for Use the updated recommendation API from liftwing (T371465)
13:25 kartik@deploy1003: Finished scap sync-world: Backport for Use the updated recommendation API from liftwing (T371465) (duration: 08m 37s)
13:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 100%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67296 and previous config saved to /var/cache/conftool/dbconfig/20240814-132256-arnaudb.json
13:22 jayme@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-main2010.codfw.wmnet']
13:20 kartik@deploy1003: kartik: Continuing with sync
13:19 kartik@deploy1003: kartik: Backport for Use the updated recommendation API from liftwing (T371465) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:18 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:18 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:16 kartik@deploy1003: Started scap sync-world: Backport for Use the updated recommendation API from liftwing (T371465)
13:14 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:14 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:11 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:11 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 75%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67295 and previous config saved to /var/cache/conftool/dbconfig/20240814-130750-arnaudb.json
12:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 50%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67293 and previous config saved to /var/cache/conftool/dbconfig/20240814-125245-arnaudb.json
12:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on 9 hosts with reason: replication table exclusion deployment
12:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on 9 hosts with reason: replication table exclusion deployment
12:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 25%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67292 and previous config saved to /var/cache/conftool/dbconfig/20240814-123739-arnaudb.json
12:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 16%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67291 and previous config saved to /var/cache/conftool/dbconfig/20240814-122234-arnaudb.json
12:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 8%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67290 and previous config saved to /var/cache/conftool/dbconfig/20240814-120729-arnaudb.json
11:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 4%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67289 and previous config saved to /var/cache/conftool/dbconfig/20240814-115223-arnaudb.json
11:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 2%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67288 and previous config saved to /var/cache/conftool/dbconfig/20240814-113718-arnaudb.json
11:23 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
11:23 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
11:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 1%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67287 and previous config saved to /var/cache/conftool/dbconfig/20240814-112212-arnaudb.json
11:20 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
11:19 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
11:19 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:18 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
09:56 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
09:26 klausman@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
09:26 klausman@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
09:23 klausman@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
09:23 klausman@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
09:17 klausman@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
09:16 klausman@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
09:11 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2189.codfw.wmnet with reason: replication still catching up
09:11 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2189.codfw.wmnet with reason: replication still catching up
08:53 jayme@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host kafka-main2010.codfw.wmnet with OS bullseye
08:46 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bullseye
07:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2189.codfw.wmnet with reason: index corruption
07:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2189.codfw.wmnet with reason: index corruption
00:54 eileen: config revision changed from d6f17100 to f569b590
00:41 eileen: civicrm upgraded from dd54b9ae to eecbba5d
00:11 eileen: civicrm upgraded from 686c7c5f to dd54b9ae
00:04 eileen: config revision changed from e8cc0ed6 to d6f17100

2024-08-13

23:08 ejegg: payments-wiki upgraded from 2d48f432 to 3eb3be67
21:56 inflatador: bking@cumin2002 reboot wdqs101[3-5],1018,1020 from DRAC due to unresponsiveness T372442
21:16 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:16 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
21:15 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:15 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
21:09 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:09 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
21:07 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:07 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
20:51 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards
20:22 brett: Update ncmonitor to 1.2.0 via apt1002
19:57 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards
19:44 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:43 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:32 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (2 nodes at a time) for ElasticSearch cluster search_eqiad: security update - bking@cumin2002 - T371874
19:29 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards
19:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards
19:25 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:25 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:25 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:25 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:24 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:24 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:05 jhuneidi@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.18 refs T366963
18:54 jhuneidi@deploy1003: Finished scap sync-world: Backport for Revert "Prevent dark-mode styles from affecting print media" (T372370) (duration: 10m 58s)
18:50 jhuneidi@deploy1003: jdlrobson, jhuneidi: Continuing with sync
18:46 jhuneidi@deploy1003: jdlrobson, jhuneidi: Backport for Revert "Prevent dark-mode styles from affecting print media" (T372370) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:44 jhuneidi@deploy1003: Started scap sync-world: Backport for Revert "Prevent dark-mode styles from affecting print media" (T372370)
18:42 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:41 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
18:41 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:41 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:40 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:40 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:45 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade — T371874 - eevans@cumin1002
17:40 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (2 nodes at a time) for ElasticSearch cluster search_eqiad: security update - bking@cumin2002 - T371874
17:39 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: security update - bking@cumin2002 - T371874
17:39 jhuneidi@deploy1003: Finished scap sync-world: testing T371904 (duration: 10m 31s)
17:28 jhuneidi@deploy1003: Started scap sync-world: testing T371904
17:27 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade — T371874 - eevans@cumin1002
17:26 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: security update - bking@cumin2002 - T371874
17:26 swfrench-wmf: run-puppet-agent on deploy1003 to pick up scap.cfg change for T371904
17:25 jhuneidi@deploy1003: Installation of scap version "latest" completed for 211 hosts
17:24 jhuneidi@deploy1003: Installing scap version "latest" for 211 hosts
17:24 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade — T371874 - eevans@cumin1002
17:06 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade — T371874 - eevans@cumin1002
16:57 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:56 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:50 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:50 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:23 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 7 hosts with reason: prep for replacement of cloudsw1-d5-eqiad
16:22 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on 7 hosts with reason: prep for replacement of cloudsw1-d5-eqiad
15:39 mutante: gerrit - starting to drop packets from abusive sources (T365259)
15:38 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-main2006.codfw.wmnet with OS bookworm
14:56 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2006.codfw.wmnet with OS bookworm
14:25 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2009.codfw.wmnet with OS bullseye
14:24 btullis@deploy1003: Finished deploy [airflow-dags/wmde@109c99e]: (no justification provided) (duration: 00m 08s)
14:24 btullis@deploy1003: Started deploy [airflow-dags/wmde@109c99e]: (no justification provided)
14:24 btullis@deploy1003: Finished deploy [airflow-dags/analytics_product@109c99e]: (no justification provided) (duration: 00m 09s)
14:23 btullis@deploy1003: Started deploy [airflow-dags/analytics_product@109c99e]: (no justification provided)
14:23 btullis@deploy1003: Finished deploy [airflow-dags/platform_eng@109c99e]: (no justification provided) (duration: 00m 24s)
14:23 btullis@deploy1003: Started deploy [airflow-dags/platform_eng@109c99e]: (no justification provided)
14:22 btullis@deploy1003: Finished deploy [airflow-dags/research@109c99e]: (no justification provided) (duration: 00m 11s)
14:22 btullis@deploy1003: Started deploy [airflow-dags/research@109c99e]: (no justification provided)
14:22 btullis@deploy1003: Finished deploy [airflow-dags/search@109c99e]: (no justification provided) (duration: 00m 19s)
14:21 btullis@deploy1003: Started deploy [airflow-dags/search@109c99e]: (no justification provided)
14:21 btullis@deploy1003: Finished deploy [airflow-dags/analytics_test@109c99e]: (no justification provided) (duration: 00m 09s)
14:21 btullis@deploy1003: Started deploy [airflow-dags/analytics_test@109c99e]: (no justification provided)
14:18 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:18 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:17 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1001.eqiad.wmnet
13:57 Lucas_WMDE: UTC backport+config window done (since ~13:10, really)
13:49 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@109c99e]: Airflow upgrade to v 2.9.3 for analytics instance. T365449. (duration: 00m 40s)
13:48 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@109c99e]: Airflow upgrade to v 2.9.3 for analytics instance. T365449.
13:46 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: update wheels - ayounsi@cumin1002
13:41 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: update wheels - ayounsi@cumin1002
13:40 XioNoX: update homer wheels - T371890
13:36 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2009.codfw.wmnet with OS bullseye
13:35 jayme@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-main2009.codfw.wmnet with OS bullseye
13:26 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2009.codfw.wmnet with OS bullseye
13:25 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2009.codfw.wmnet with OS bullseye
13:05 elukey: `apt-get install python3-conftool python3-conftool-requestctl` on all puppetserver nodes - upgrade to 3.2.2
13:04 Lucas_WMDE: FINISHED lucaswerkmeister-wmde@mwmaint1002:~$ mwscript maintenance/cleanupTitles.php --wiki=hewikisource --prefix=T314733 2>&1 | tee ~/T314733.log
13:01 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint1002:~$ mwscript maintenance/cleanupTitles.php --wiki=hewikisource --prefix=T314733 2>&1 | tee ~/T314733.log
12:47 filippo@deploy1003: Finished scap: new statsd-exporter limits (duration: 03m 52s)
12:43 filippo@deploy1003: Started scap sync-world: new statsd-exporter limits
12:37 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2009.codfw.wmnet with OS bullseye
12:18 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2010.codfw.wmnet with OS bullseye
11:35 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-launcher1002.eqiad.wmnet
11:29 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-launcher1002.eqiad.wmnet
11:28 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bullseye
11:17 XioNoX: deploy pfw policy update 1723510554 - T372367
11:11 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1005.eqiad.wmnet
11:07 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-airflow1005.eqiad.wmnet
11:01 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1006.eqiad.wmnet
10:57 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-airflow1006.eqiad.wmnet
10:53 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1004.eqiad.wmnet
10:49 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-airflow1004.eqiad.wmnet
10:38 dcaro@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:38 dcaro@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added ipv6 entry for cloudcephosd1039 - dcaro@cumin1002"
10:38 dcaro@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added ipv6 entry for cloudcephosd1039 - dcaro@cumin1002"
10:38 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1002.eqiad.wmnet
10:32 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-airflow1002.eqiad.wmnet
10:27 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=s8
10:27 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=s5
10:26 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1016.eqiad.wmnet with OS bookworm
10:15 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s8
10:13 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet,service=s8
10:13 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet,service=s5
10:11 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1007.eqiad.wmnet
10:10 dcaro@cumin1002: START - Cookbook sre.dns.netbox
10:07 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-airflow1007.eqiad.wmnet
10:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
10:05 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-airflow1007.eqiad.wmnet
10:05 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-airflow1007.eqiad.wmnet
10:02 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1016.eqiad.wmnet with reason: host reimage
10:00 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
09:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
09:59 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1016.eqiad.wmnet with reason: host reimage
09:59 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
09:54 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main2010.codfw.wmnet with OS bullseye
09:46 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1016.eqiad.wmnet with OS bookworm
09:41 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s8
09:41 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s5
09:40 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1016.eqiad.wmnet with reason: Reimaging clouddb1016 T365424
09:40 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb1016.eqiad.wmnet with reason: Reimaging clouddb1016 T365424
09:23 elukey: manual run of dump_cloud_ip_ranges.service on puppetserver1001 (failed earlier on)
09:03 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bullseye
09:01 kevinbazira@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
08:59 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
08:52 elukey: upgrade conftool python packages on puppetserver1001 to 3.2.2
08:51 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
08:49 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2002.codfw.wmnet
08:48 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
08:36 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
08:19 XioNoX: upgrade postgresql on netboxdb hosts
08:18 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
08:00 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
07:47 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
07:47 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
07:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: index corruption
07:43 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: index corruption
07:12 arnaudb@cumin1002: dbctl commit (dc=all): 'es1 master: es1027', diff saved to https://phabricator.wikimedia.org/P67282 and previous config saved to /var/cache/conftool/dbconfig/20240813-071240-arnaudb.json
04:00 mwpresync@deploy1003: Pruned MediaWiki: 1.43.0-wmf.15 (duration: 00m 56s)
03:50 mwpresync@deploy1003: Finished scap: testwikis to 1.43.0-wmf.18 refs T366963 (duration: 48m 26s)
03:02 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.43.0-wmf.18 refs T366963

2024-08-12

23:00 rzl@deploy1003: Finished scap: https://gerrit.wikimedia.org/r/1060515 (duration: 02m 14s)
22:58 rzl@deploy1003: Started scap sync-world: https://gerrit.wikimedia.org/r/1060515
21:22 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Apply openjdk upgrade — T371874 - eevans@cumin1002
21:17 zabe: start wrapping type B password hashes with encrypted pbkdf2 in screen - T112359
20:50 jhathaway: upgrading postgresql on puppetdb1003
20:45 jhathaway: upgrading postgresql on puppetdb2003
20:29 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
20:29 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
20:24 zabe: update prefix of wrongly prefixed user password hashes from ':A:' to ':B:' in small batches -- T112359
20:22 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:19 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
20:19 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
20:18 zabe@deploy1003: Finished scap: Backport for EventStreamConfig for mediawiki.cirrussearch.page_weighted_tags_change.rc0 (T366253) (duration: 07m 55s)
20:14 zabe@deploy1003: pfischer, zabe: Continuing with sync
20:13 zabe@deploy1003: pfischer, zabe: Backport for EventStreamConfig for mediawiki.cirrussearch.page_weighted_tags_change.rc0 (T366253) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:10 zabe@deploy1003: Started scap sync-world: Backport for EventStreamConfig for mediawiki.cirrussearch.page_weighted_tags_change.rc0 (T366253)
20:10 zabe@deploy1003: Finished scap: Backport for Set wgAutoConfirmCount to 10 for azwiki (T372172) (duration: 08m 01s)
20:05 zabe@deploy1003: nmw03, zabe: Continuing with sync
20:04 zabe@deploy1003: nmw03, zabe: Backport for Set wgAutoConfirmCount to 10 for azwiki (T372172) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:02 zabe@deploy1003: Started scap sync-world: Backport for Set wgAutoConfirmCount to 10 for azwiki (T372172)
20:01 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:47 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
19:47 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
19:46 ottomata: rolling restart of eventgate-main in codfw - T371767
19:38 zabe@deploy1003: Finished scap: Backport for Use encrypted PBKDF2 for wrapping B type passwords instead of Argon2 (T112359) (duration: 07m 08s)
19:33 zabe@deploy1003: zabe: Continuing with sync
19:33 zabe@deploy1003: zabe: Backport for Use encrypted PBKDF2 for wrapping B type passwords instead of Argon2 (T112359) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P67279 and previous config saved to /var/cache/conftool/dbconfig/20240812-193157-ladsgroup.json
19:30 zabe@deploy1003: Started scap sync-world: Backport for Use encrypted PBKDF2 for wrapping B type passwords instead of Argon2 (T112359)
19:21 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=dewiki --force --db-table --verbose # T372333, script finished, logs are (gzipped) at F57269843
19:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P67278 and previous config saved to /var/cache/conftool/dbconfig/20240812-191650-ladsgroup.json
19:15 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=dewiki --force --db-table --verbose # T372333, script started
19:13 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=dewiki --search-index --verbose # T372333, logs available as P67277
19:09 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Apply openjdk upgrade — T371874 - eevans@cumin1002
19:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P67276 and previous config saved to /var/cache/conftool/dbconfig/20240812-190145-ladsgroup.json
18:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db1238 (T371342)', diff saved to https://phabricator.wikimedia.org/P67275 and previous config saved to /var/cache/conftool/dbconfig/20240812-184830-ladsgroup.json
18:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P67274 and previous config saved to /var/cache/conftool/dbconfig/20240812-184639-ladsgroup.json
18:06 urbanecm@deploy1003: Finished scap: Backport for [Growth] dewiki: Enable frontend for Add Link (T371597) (duration: 09m 59s)
18:02 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:02 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
18:02 urbanecm@deploy1003: urbanecm: Continuing with sync
17:58 urbanecm@deploy1003: urbanecm: Backport for [Growth] dewiki: Enable frontend for Add Link (T371597) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:57 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:57 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
17:56 urbanecm@deploy1003: Started scap sync-world: Backport for [Growth] dewiki: Enable frontend for Add Link (T371597)
17:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testhost2001.codfw.wmnet with OS bookworm
17:27 ebernhardson@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:27 ebernhardson@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
17:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testhost2001.codfw.wmnet with reason: host reimage
17:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testhost2001.codfw.wmnet with reason: host reimage
16:59 urbanecm@deploy1003: Finished scap: Backport for Revert "[Growth] dewiki: Enable frontend for Add Link" (duration: 06m 39s)
16:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bookworm
16:55 urbanecm@deploy1003: urbanecm, trainbranchbot: Continuing with sync
16:55 urbanecm@deploy1003: urbanecm, trainbranchbot: Backport for Revert "[Growth] dewiki: Enable frontend for Add Link" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:53 urbanecm@deploy1003: Started scap sync-world: Backport for Revert "[Growth] dewiki: Enable frontend for Add Link"
16:51 urbanecm@deploy1003: Sync cancelled.
16:50 urbanecm@deploy1003: urbanecm: Backport for [Growth] dewiki: Enable frontend for Add Link (T371597) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:48 urbanecm@deploy1003: Started scap sync-world: Backport for [Growth] dewiki: Enable frontend for Add Link (T371597)
16:36 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:36 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
16:33 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: security update - bking@cumin2002 - T371874
16:32 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:32 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
16:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
16:13 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
16:10 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:09 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
16:00 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:00 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:56 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
15:56 milimetric@deploy1003: Finished deploy [airflow-dags/analytics@416511b]: (no justification provided) (duration: 00m 40s)
15:55 milimetric@deploy1003: Started deploy [airflow-dags/analytics@416511b]: (no justification provided)
15:54 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
15:54 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
15:54 cdanis@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
15:54 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
15:53 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
15:53 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
15:53 cdanis@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
15:53 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
15:52 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
15:52 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
15:52 cdanis@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
15:52 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
15:51 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/services/apertium: apply
15:51 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/services/apertium: apply
15:50 cdanis@deploy1003: helmfile [codfw] START helmfile.d/services/apertium: apply
15:36 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
15:35 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
15:34 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
15:34 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
15:23 urbanecm@deploy1003: Finished scap: Backport for noc: Fix list of databases in db.php (T372249) (duration: 08m 22s)
15:15 urbanecm@deploy1003: Started scap sync-world: Backport for noc: Fix list of databases in db.php (T372249)
15:07 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
15:06 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
14:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
14:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
14:44 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: security update - bking@cumin2002 - T371874
14:42 elukey: powercycle ms-be1078 - causing frontend errors in swift-eqiad, network link is down (if down/up didn't work, nothing in the dmesg/syslog)
14:42 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
14:41 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
14:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
14:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
14:34 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
14:23 zabe@deploy1003: Finished scap: Backport for Further configuration for bdrwiki (T371760) (duration: 21m 07s)
14:01 zabe@deploy1003: Started scap sync-world: Backport for Further configuration for bdrwiki (T371760)
13:46 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
13:46 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
13:33 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
13:33 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
13:25 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
13:24 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
13:24 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
12:37 elukey: restart exim4 on list2001 to pick up the new TLS material
12:35 elukey: restart exim4 on list1004 to pick up the new TLS material
12:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
12:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
12:11 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Openjdk upgrade - elukey@cumin1002
12:04 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
12:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
11:59 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
11:26 hnowlan: rebuilding php7.4-fpm and php7.4-fpm-multiversion-base to pick up healthz worker awareness change (r/1060867)
11:22 ladsgroup@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
11:10 kevinbazira@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
11:06 isaranto@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
11:04 isaranto@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
11:03 isaranto@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
10:19 vgutierrez: restarting apache on puppetmaster1003
09:54 kamila_: rebooting puppetmaster1001 due to intermittent network failures
09:46 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 54994
09:43 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 54994
09:17 urbanecm@deploy1003: Finished scap: Backport for MenteeOverviewApi: Do not apply undefined/null params (T372164) (duration: 19m 54s)
09:11 urbanecm@deploy1003: urbanecm: Continuing with sync
09:11 godog: bounce grafana after https://gerrit.wikimedia.org/r/c/operations/puppet/+/1061955
09:10 urbanecm@deploy1003: urbanecm: Backport for MenteeOverviewApi: Do not apply undefined/null params (T372164) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:57 urbanecm@deploy1003: Started scap sync-world: Backport for MenteeOverviewApi: Do not apply undefined/null params (T372164)
07:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: index corruption
07:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: index corruption
07:38 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 - s2', diff saved to https://phabricator.wikimedia.org/P67270 and previous config saved to /var/cache/conftool/dbconfig/20240812-073846-arnaudb.json

2024-08-11

07:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
07:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
07:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T367856)', diff saved to https://phabricator.wikimedia.org/P67269 and previous config saved to /var/cache/conftool/dbconfig/20240811-075839-marostegui.json
07:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P67268 and previous config saved to /var/cache/conftool/dbconfig/20240811-074332-marostegui.json
07:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P67267 and previous config saved to /var/cache/conftool/dbconfig/20240811-072825-marostegui.json
07:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T367856)', diff saved to https://phabricator.wikimedia.org/P67266 and previous config saved to /var/cache/conftool/dbconfig/20240811-071318-marostegui.json
03:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20240729/ using stat1009.eqiad.wmnet)

2024-08-10

08:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T367856)', diff saved to https://phabricator.wikimedia.org/P67264 and previous config saved to /var/cache/conftool/dbconfig/20240810-085527-marostegui.json
08:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1235.eqiad.wmnet with reason: Maintenance
08:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1235.eqiad.wmnet with reason: Maintenance
08:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T367856)', diff saved to https://phabricator.wikimedia.org/P67263 and previous config saved to /var/cache/conftool/dbconfig/20240810-085505-marostegui.json
08:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P67262 and previous config saved to /var/cache/conftool/dbconfig/20240810-083958-marostegui.json
08:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P67261 and previous config saved to /var/cache/conftool/dbconfig/20240810-082450-marostegui.json
08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T367856)', diff saved to https://phabricator.wikimedia.org/P67260 and previous config saved to /var/cache/conftool/dbconfig/20240810-080943-marostegui.json

2024-08-09

22:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1299.eqiad.wmnet with OS bullseye
21:30 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1266.eqiad.wmnet with OS bullseye
21:30 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
21:29 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
21:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1266.eqiad.wmnet with reason: host reimage
21:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1266.eqiad.wmnet with reason: host reimage
21:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1266.eqiad.wmnet with OS bullseye
21:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1299.eqiad.wmnet with OS bullseye
21:09 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1299.eqiad.wmnet with OS bullseye
21:09 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1299.eqiad.wmnet with OS bullseye
20:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1297.eqiad.wmnet with OS bullseye
20:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
20:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
20:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1297.eqiad.wmnet with reason: host reimage
20:06 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1297.eqiad.wmnet with reason: host reimage
20:03 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
19:54 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
19:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1300.eqiad.wmnet with OS bullseye
19:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1300.mgmt.eqiad.wmnet with reboot policy FORCED
19:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1303.eqiad.wmnet with OS bullseye
19:51 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1301.eqiad.wmnet with OS bullseye
19:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1303.mgmt.eqiad.wmnet with reboot policy FORCED
19:50 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1302.eqiad.wmnet with OS bullseye
19:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1299.eqiad.wmnet with OS bullseye
19:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1304.eqiad.wmnet with OS bullseye
19:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1297.eqiad.wmnet with OS bullseye
19:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1301.mgmt.eqiad.wmnet with reboot policy FORCED
19:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1302.mgmt.eqiad.wmnet with reboot policy FORCED
19:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1304.mgmt.eqiad.wmnet with reboot policy FORCED
19:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1299.mgmt.eqiad.wmnet with reboot policy FORCED
19:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1297.mgmt.eqiad.wmnet with reboot policy FORCED
19:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
19:30 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
19:29 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
19:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1303.mgmt.eqiad.wmnet with reboot policy FORCED
19:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1302.mgmt.eqiad.wmnet with reboot policy FORCED
19:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1304.mgmt.eqiad.wmnet with reboot policy FORCED
19:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1299.mgmt.eqiad.wmnet with reboot policy FORCED
19:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1301.mgmt.eqiad.wmnet with reboot policy FORCED
19:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1300.mgmt.eqiad.wmnet with reboot policy FORCED
19:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1298.mgmt.eqiad.wmnet with reboot policy FORCED
19:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1297.mgmt.eqiad.wmnet with reboot policy FORCED
19:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker - jclark@cumin1002"
19:21 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker - jclark@cumin1002"
19:18 jclark@cumin1002: START - Cookbook sre.dns.netbox
19:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1260.eqiad.wmnet with OS bullseye
19:02 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
18:11 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
18:10 inflatador: bking@wdqs-codfw-public mitigate codfw wdqs abuse via nginx hotfix T372074
17:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1260.eqiad.wmnet with reason: host reimage
17:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1260.eqiad.wmnet with reason: host reimage
17:28 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1260.eqiad.wmnet with OS bullseye
17:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1016.mgmt.eqiad.wmnet with reboot policy FORCED
17:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1019.mgmt.eqiad.wmnet with reboot policy FORCED
17:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1018.mgmt.eqiad.wmnet with reboot policy FORCED
17:15 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1020.mgmt.eqiad.wmnet with reboot policy FORCED
17:14 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1260
17:13 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1260
17:12 jclark@cumin1002: START - Cookbook sre.dns.netbox
17:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1017.mgmt.eqiad.wmnet with reboot policy FORCED
17:00 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1266.eqiad.wmnet with OS bullseye
16:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-presto1018.mgmt.eqiad.wmnet with reboot policy FORCED
16:50 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-presto1018.mgmt.eqiad.wmnet with reboot policy FORCED
16:50 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-presto1020.mgmt.eqiad.wmnet with reboot policy FORCED
16:50 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-presto1019.mgmt.eqiad.wmnet with reboot policy FORCED
16:49 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-presto1018.mgmt.eqiad.wmnet with reboot policy FORCED
16:48 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-presto1017.mgmt.eqiad.wmnet with reboot policy FORCED
16:48 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-presto1016.mgmt.eqiad.wmnet with reboot policy FORCED
16:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1266.eqiad.wmnet with reason: host reimage
16:46 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:46 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt an-presto1016-20 - jclark@cumin1002"
16:46 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt an-presto1016-20 - jclark@cumin1002"
16:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1266.eqiad.wmnet with reason: host reimage
16:43 jclark@cumin1002: START - Cookbook sre.dns.netbox
16:28 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1266.eqiad.wmnet with OS bullseye
16:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1260.eqiad.wmnet with OS bullseye
15:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus1007.eqiad.wmnet with OS bookworm
15:08 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus1008.eqiad.wmnet with OS bookworm
15:01 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:57 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host alert1002.wikimedia.org with OS bookworm
14:52 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:50 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus1008.eqiad.wmnet with reason: host reimage
14:37 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus1008.eqiad.wmnet with reason: host reimage
14:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus1007.eqiad.wmnet with reason: host reimage
14:26 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus1007.eqiad.wmnet with reason: host reimage
13:58 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host alert1002.wikimedia.org with OS bookworm
13:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host prometheus1008.mgmt.eqiad.wmnet with reboot policy FORCED
13:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host prometheus1007.mgmt.eqiad.wmnet with reboot policy FORCED
13:55 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:55 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt prometheus1007-8 - jclark@cumin1002"
13:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt prometheus1007-8 - jclark@cumin1002"
13:52 jclark@cumin1002: START - Cookbook sre.dns.netbox
13:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host alert1002.mgmt.eqiad.wmnet with reboot policy FORCED
13:29 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host alert1002
13:29 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host alert1002
13:23 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt allert1004 - jclark@cumin1002"
13:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt allert1004 - jclark@cumin1002"
13:20 jclark@cumin1002: START - Cookbook sre.dns.netbox
13:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host alert1002.mgmt.eqiad.wmnet with reboot policy FORCED
08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T367856)', diff saved to https://phabricator.wikimedia.org/P67259 and previous config saved to /var/cache/conftool/dbconfig/20240809-080904-marostegui.json
08:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1234.eqiad.wmnet with reason: Maintenance
08:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1234.eqiad.wmnet with reason: Maintenance
08:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T367856)', diff saved to https://phabricator.wikimedia.org/P67258 and previous config saved to /var/cache/conftool/dbconfig/20240809-080842-marostegui.json
07:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P67257 and previous config saved to /var/cache/conftool/dbconfig/20240809-075335-marostegui.json
07:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P67256 and previous config saved to /var/cache/conftool/dbconfig/20240809-073828-marostegui.json
07:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T367856)', diff saved to https://phabricator.wikimedia.org/P67254 and previous config saved to /var/cache/conftool/dbconfig/20240809-072320-marostegui.json
05:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading wikidata_main on wdqs1021.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240729/ using stat1009.eqiad.wmnet)
04:40 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15:00:00 on 9 hosts with reason: T364368 non-prod hosts
04:40 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 15:00:00 on 9 hosts with reason: T364368 non-prod hosts
04:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T364077, transfer to unpooled host (1022) to test cookbook changes) xfer wikidata from wdqs1012.eqiad.wmnet -> wdqs1022.eqiad.wmnet, repooling source-only afterwards
04:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, transfer to unpooled host (1022) to test cookbook changes) xfer wikidata from wdqs1012.eqiad.wmnet -> wdqs1022.eqiad.wmnet, repooling source-only afterwards
04:03 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 03s)
04:03 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
04:03 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 30s)
04:02 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
01:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testhost2001.codfw.wmnet with OS bookworm

2024-08-08

22:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bookworm
22:18 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testhost2001.codfw.wmnet with OS bookworm
21:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bookworm
21:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['testhost2001.codfw.wmnet']
21:55 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['testhost2001.codfw.wmnet']
21:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testhost2001.codfw.wmnet with OS bookworm
21:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testhost2001.codfw.wmnet with reason: host reimage
21:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testhost2001.codfw.wmnet with reason: host reimage
21:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bookworm
21:29 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: security update - bking@cumin2002 - T371874
21:21 ebernhardson@deploy1003: Synchronized private/PrivateSettings.php: Update NetworkSession users list for T341332 (duration: 06m 15s)
21:05 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: security update - bking@cumin2002 - T371874
20:21 samtar@deploy1003: Finished scap: Backport for Enable protection indicators for azwiki (T371440) (duration: 08m 22s)
20:17 samtar@deploy1003: samtar, nmw03: Continuing with sync
20:15 samtar@deploy1003: samtar, nmw03: Backport for Enable protection indicators for azwiki (T371440) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:13 samtar@deploy1003: Started scap sync-world: Backport for Enable protection indicators for azwiki (T371440)
20:12 milimetric@deploy1003: Finished deploy [airflow-dags/analytics@0266527]: (no justification provided) (duration: 00m 49s)
20:11 milimetric@deploy1003: Started deploy [airflow-dags/analytics@0266527]: (no justification provided)
19:57 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
19:56 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
19:56 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
19:55 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
19:55 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
19:54 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
19:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm
19:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testhost2001.codfw.wmnet with OS bookworm
19:31 dancy@deploy1003: Started scap sync-world: testing T371904
19:23 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T371874
19:20 dancy@deploy1003: Started scap sync-world: testing T371904
19:13 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T371874
19:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bookworm
19:06 dancy@deploy1003: Finished scap: testing T371904 (duration: 02m 40s)
19:05 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS bullseye
19:05 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1024.eqiad.wmnet with OS bullseye
19:03 dancy@deploy1003: Started scap sync-world: testing T371904
19:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2011.codfw.wmnet with OS bookworm
19:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2010.codfw.wmnet with OS bookworm
19:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:58 ryankemper: [Elastic] `ryankemper@cumin2002:~$ sudo -E cumin 'elastic2062*,elastic2082*,elastic2088*,elastic2090*,elastic2099*,elastic2103*' 'pool'` (hosts that had not been repooled after previous maintenance)
18:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2011.codfw.wmnet with reason: host reimage
18:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
18:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2010.codfw.wmnet with reason: host reimage
18:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2011.codfw.wmnet with reason: host reimage
18:44 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2010.codfw.wmnet with reason: host reimage
18:41 jhancock@cumin2002: START - Cookbook sre.dns.netbox
18:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2011.codfw.wmnet with OS bookworm
18:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2010.codfw.wmnet with OS bookworm
17:45 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS bullseye
17:44 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1024.eqiad.wmnet with OS bullseye
17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2009.codfw.wmnet with OS bookworm
17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:22 dreamyjazz@deploy1003: Finished scap: Backport for Convert gb_id to integer in GlobalBlock (T372063) (duration: 06m 48s)
17:17 dreamyjazz@deploy1003: urbanecm, dreamyjazz: Continuing with sync
17:17 dreamyjazz@deploy1003: urbanecm, dreamyjazz: Backport for Convert gb_id to integer in GlobalBlock (T372063) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for Convert gb_id to integer in GlobalBlock (T372063)
17:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2009.codfw.wmnet with reason: host reimage
17:11 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2009.codfw.wmnet with reason: host reimage
17:11 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20240729/ using stat1009.eqiad.wmnet)
17:09 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2009.codfw.wmnet with OS bookworm
17:08 inflatador: bking@wdqs1020 restart wdqs-blazegraph service due to excessive GC
16:29 elukey: debmonitor-client 0.4.0 rolledout to all bullseye nodes
16:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm
16:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
16:24 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating mgmt ips in codfw - jhancock@cumin2002"
16:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating mgmt ips in codfw - jhancock@cumin2002"
16:20 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testhost2001.codfw.wmnet with OS bookworm
16:14 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bookworm
16:11 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testhost2001.codfw.wmnet with OS bookworm
16:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bookworm
16:07 elukey: on cumin1002 "sudo cumin -b 20 -p 95 'P{F:lsbdistcodename="bullseye"} and A:codfw' 'run-puppet-agent -q --failed-only'"
16:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve2010.codfw.wmnet with OS bookworm
16:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve2009.codfw.wmnet with OS bookworm
16:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2010.codfw.wmnet with OS bookworm
16:04 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2009.codfw.wmnet with OS bookworm
16:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm
16:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
15:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['testhost2001.codfw.wmnet']
15:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve2011.codfw.wmnet with OS bookworm
15:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve2010.codfw.wmnet with OS bookworm
15:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve2009.codfw.wmnet with OS bookworm
15:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['testhost2001.codfw.wmnet']
15:21 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ml-serve2004.codfw.wmnet with reason: Hardware maintenance for memory errors
15:21 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ml-serve2004.codfw.wmnet with reason: Hardware maintenance for memory errors
15:16 Reedy: test
14:52 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Running sync-netbox-hiera manually because it failed during the reimage - fnegri@cumin1002 - T365424"
14:51 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Running sync-netbox-hiera manually because it failed during the reimage - fnegri@cumin1002 - T365424"
14:40 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s2
14:40 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s7
14:32 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2011.codfw.wmnet with OS bookworm
14:32 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2010.codfw.wmnet with OS bookworm
14:32 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2009.codfw.wmnet with OS bookworm
14:25 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1018.eqiad.wmnet with OS bookworm
14:24 fnegri@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002"
14:24 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002"
14:02 ladsgroup@deploy1003: ladsgroup: Backport for Add missing close tags to #contentSub message (T372054) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:01 stevemunene@deploy1003: Finished deploy [airflow-dags/analytics_test@2a3060e]: (no justification provided) (duration: 00m 33s)
14:00 stevemunene@deploy1003: Started deploy [airflow-dags/analytics_test@2a3060e]: (no justification provided)
13:59 ladsgroup@deploy1003: Started scap sync-world: Backport for Add missing close tags to #contentSub message (T372054)
13:51 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
13:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
13:47 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
13:44 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1018.eqiad.wmnet with reason: host reimage
13:41 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1018.eqiad.wmnet with reason: host reimage
13:28 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1018.eqiad.wmnet with OS bookworm
13:25 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1018.eqiad.wmnet with reason: Reimaging clouddb1018 T365424
13:25 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb1018.eqiad.wmnet with reason: Reimaging clouddb1018 T365424
13:24 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s7
13:24 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s2
12:47 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.17 refs T366962
12:23 samtar@deploy1003: Finished scap: Backport for mswikisource: add custom logos (T372031) (duration: 08m 47s)
12:22 dcausse: T371401: reindexing wikidatawiki@codfw to index mul labels
12:18 samtar@deploy1003: chlod, samtar: Continuing with sync
12:18 samtar@deploy1003: chlod, samtar: Backport for mswikisource: add custom logos (T372031) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:14 samtar@deploy1003: Started scap sync-world: Backport for mswikisource: add custom logos (T372031)
12:11 samtar@deploy1003: Finished scap: Backport for bdrwiki: add custom logos (T372031) (duration: 09m 20s)
12:06 samtar@deploy1003: chlod, samtar: Continuing with sync
12:05 samtar@deploy1003: chlod, samtar: Backport for bdrwiki: add custom logos (T372031) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:01 samtar@deploy1003: Started scap sync-world: Backport for bdrwiki: add custom logos (T372031)
11:58 samtar@deploy1003: Finished scap: Backport for dtpwiki: add custom logos (T372031) (duration: 10m 10s)
11:53 samtar@deploy1003: chlod, samtar: Continuing with sync
11:52 samtar@deploy1003: chlod, samtar: Backport for dtpwiki: add custom logos (T372031) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:48 samtar@deploy1003: Started scap sync-world: Backport for dtpwiki: add custom logos (T372031)
11:35 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica to new version
10:39 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.17 refs T366962
09:53 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.17 refs T366962
09:38 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Openjdk upgrade - elukey@cumin1002
09:37 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1009.eqiad.wmnet with reason: Rebooting due to CPU soft lockup
09:37 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1009.eqiad.wmnet with reason: Rebooting due to CPU soft lockup
09:32 dreamyjazz@deploy1003: Finished scap: Backport for Fix DefaultPresenter rejecting IPCountInfo instances (T371966) (duration: 10m 38s)
09:27 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
09:24 elukey: powercycle ml-serve2004 - host frozen, no ssh access, get sel shows "Multi-bit memory errors detected on a memory device at location(s) DIMM_A2."
09:23 dreamyjazz@deploy1003: dreamyjazz: Backport for Fix DefaultPresenter rejecting IPCountInfo instances (T371966) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:21 dreamyjazz@deploy1003: Started scap sync-world: Backport for Fix DefaultPresenter rejecting IPCountInfo instances (T371966)
08:45 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:30 dcausse: T371401: reindexing wikidatawiki@eqiad to index mul labels
08:23 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "vtrs1003+gerrit1004 - ayounsi@cumin1002"
08:23 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "vtrs1003+gerrit1004 - ayounsi@cumin1002"
08:19 elukey: restart dump_ip_reputation.service on puppetserver1001
08:13 elukey: restart tomcat on idp[1,2]003 to pick up the new openjdk
08:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T367856)', diff saved to https://phabricator.wikimedia.org/P67252 and previous config saved to /var/cache/conftool/dbconfig/20240808-081041-marostegui.json
08:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1232.eqiad.wmnet with reason: Maintenance
08:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1232.eqiad.wmnet with reason: Maintenance
08:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T367856)', diff saved to https://phabricator.wikimedia.org/P67251 and previous config saved to /var/cache/conftool/dbconfig/20240808-081019-marostegui.json
08:09 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Openjdk upgrade - elukey@cumin1002
07:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P67250 and previous config saved to /var/cache/conftool/dbconfig/20240808-075512-marostegui.json
07:42 logmsgbot: @deploy1003 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:42 logmsgbot: @deploy1003 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
07:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P67249 and previous config saved to /var/cache/conftool/dbconfig/20240808-074005-marostegui.json
07:32 dcausse: T371401: reindexing testwikidatawiki to index mul labels
07:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T367856)', diff saved to https://phabricator.wikimedia.org/P67248 and previous config saved to /var/cache/conftool/dbconfig/20240808-072458-marostegui.json
07:19 hashar: Restarted CI Jenkins for upgrade and plugin update # T371976
07:11 dcausse@deploy1003: Finished scap: Backport for search: index stems for mul labels (T371401) (duration: 09m 03s)
07:06 dcausse@deploy1003: dcausse: Continuing with sync
07:04 dcausse@deploy1003: dcausse: Backport for search: index stems for mul labels (T371401) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:02 dcausse@deploy1003: Started scap sync-world: Backport for search: index stems for mul labels (T371401)
06:57 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
06:57 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
06:57 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
06:57 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
06:51 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
06:51 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
06:51 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
06:51 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
06:51 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
06:51 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
06:51 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
06:51 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
06:51 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
06:51 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-misc: apply
06:51 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
06:51 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
06:51 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
06:50 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
06:50 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
06:50 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
06:50 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
06:50 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
06:50 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
06:50 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
06:49 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
06:48 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
06:48 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-web: apply
06:48 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
06:47 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-web: apply
06:43 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
06:42 hashar: restarting Gerrit
06:42 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
06:41 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
06:41 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
06:41 oblivian@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
06:41 oblivian@deploy1003: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
06:35 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
02:19 cstone: civicrm upgraded from d1f1d7bd to 686c7c5f
00:30 rzl@deploy1003: Finished scap: https://gerrit.wikimedia.org/r/1060184 (duration: 02m 33s)
00:29 rzl@deploy1003: Started scap sync-world: https://gerrit.wikimedia.org/r/1060184

2024-08-07

21:23 cstone: payments-wiki upgraded from 88500664 to a7f3301a
21:19 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host vrts1003.eqiad.wmnet with OS bookworm
21:03 cstone: payments-wiki upgraded from 49a9e765 to 88500664
21:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts1003.eqiad.wmnet with reason: host reimage
20:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts1003.eqiad.wmnet with reason: host reimage
20:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gerrit1004.wikimedia.org with OS bookworm
20:53 milimetric@deploy1003: Finished deploy [airflow-dags/analytics@4cf9922]: (no justification provided) (duration: 00m 38s)
20:53 milimetric@deploy1003: Started deploy [airflow-dags/analytics@4cf9922]: (no justification provided)
20:39 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit1004.wikimedia.org with reason: host reimage
20:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host vrts1003.eqiad.wmnet with OS bookworm
20:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host vrts1003.mgmt.eqiad.wmnet with reboot policy FORCED
20:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit1004.wikimedia.org with reason: host reimage
20:21 cjming: end of UTC late backport window
20:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host gerrit1004.wikimedia.org with OS bookworm
20:15 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gerrit1004.wikimedia.org with OS bookworm
20:11 milimetric@deploy1003: Finished deploy [airflow-dags/analytics@049c09e]: (no justification provided) (duration: 00m 03s)
20:11 milimetric@deploy1003: Started deploy [airflow-dags/analytics@049c09e]: (no justification provided)
20:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host vrts1003.mgmt.eqiad.wmnet with reboot policy FORCED
20:04 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt vrts1003 - jclark@cumin1002"
20:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt vrts1003 - jclark@cumin1002"
20:01 jclark@cumin1002: START - Cookbook sre.dns.netbox
19:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit1004.wikimedia.org with reason: host reimage
19:55 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit1004.wikimedia.org with reason: host reimage
19:53 milimetric@deploy1003: Finished deploy [airflow-dags/analytics@049c09e]: (no justification provided) (duration: 00m 59s)
19:52 milimetric@deploy1003: Started deploy [airflow-dags/analytics@049c09e]: (no justification provided)
19:52 milimetric@deploy1003: Finished deploy [airflow-dags/analytics@216348d]: (no justification provided) (duration: 00m 47s)
19:51 milimetric@deploy1003: Started deploy [airflow-dags/analytics@216348d]: (no justification provided)
19:47 milimetric@deploy1003: Finished deploy [airflow-dags/analytics@049c09e]: Deploying new Browser General job (duration: 00m 02s)
19:47 milimetric@deploy1003: Started deploy [airflow-dags/analytics@049c09e]: Deploying new Browser General job
19:46 milimetric@deploy1003: Finished deploy [airflow-dags/analytics@049c09e]: Deploying new Browser General job (duration: 00m 41s)
19:45 milimetric@deploy1003: Started deploy [airflow-dags/analytics@049c09e]: Deploying new Browser General job
19:39 ebernhardson@deploy1003: Finished deploy [airflow-dags/search@049c09e]: workaround process_sparql_query oom issues (duration: 00m 20s)
19:39 ebernhardson@deploy1003: Started deploy [airflow-dags/search@049c09e]: workaround process_sparql_query oom issues
19:38 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host gerrit1004.wikimedia.org with OS bookworm
19:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host gerrit1004.mgmt.eqiad.wmnet with reboot policy FORCED
19:33 brett: start pybal on lvs1017
19:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1017.eqiad.wmnet
19:29 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs1017.eqiad.wmnet
19:18 brennen@deploy1003: Finished scap: Backport for Fix TypeError in PendingChanges by handling null subPage (T371986) (duration: 08m 23s)
19:14 brennen@deploy1003: brennen: Continuing with sync
19:12 brennen@deploy1003: brennen: Backport for Fix TypeError in PendingChanges by handling null subPage (T371986) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host gerrit1004.mgmt.eqiad.wmnet with reboot policy FORCED
19:11 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:11 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt gerrit1004 - jclark@cumin1002"
19:11 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt gerrit1004 - jclark@cumin1002"
19:10 brennen@deploy1003: Started scap sync-world: Backport for Fix TypeError in PendingChanges by handling null subPage (T371986)
19:08 jclark@cumin1002: START - Cookbook sre.dns.netbox
19:04 brett: stop pybal on lvs1017 for server reboot
19:00 brett: start pybal on lvs1018
18:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1018.eqiad.wmnet
18:56 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs1018.eqiad.wmnet
18:45 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1296.eqiad.wmnet with OS bullseye
18:40 brett: stop pybal on lvs1018 for server reboot
18:39 milimetric@deploy1003: Finished deploy [analytics/refinery@fe20690]: Syncing browser general script hive version (duration: 16m 05s)
18:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1296.eqiad.wmnet with reason: host reimage
18:33 brett: start pybal on lvs1019
18:32 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1296.eqiad.wmnet with reason: host reimage
18:32 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1038.eqiad.wmnet with OS bullseye
18:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1296.eqiad.wmnet with OS bullseye
18:28 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1296.eqiad.wmnet with OS bullseye
18:22 milimetric@deploy1003: Started deploy [analytics/refinery@fe20690]: Syncing browser general script hive version
18:20 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1019.eqiad.wmnet
18:17 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs1019.eqiad.wmnet
18:14 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
18:12 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
18:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1296.eqiad.wmnet with reason: host reimage
18:09 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1296.eqiad.wmnet with reason: host reimage
17:54 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1038.eqiad.wmnet with OS bullseye
17:41 sukhe: running authdns-update for Yahoo CFL TXT record: T370963
17:35 brennen@deploy1003: Finished scap: Backport for Revert "Drop writeapi flag from siteinfo API" (T115414 T294397 T371977) (duration: 08m 06s)
17:34 milimetric@deploy1003: Finished deploy [analytics/refinery@0d25645] (thin): Syncing browser general script, and refinery-source 0.2.45 apparently (duration: 04m 21s)
17:31 brennen@deploy1003: brennen, bd808: Continuing with sync
17:30 milimetric@deploy1003: Started deploy [analytics/refinery@0d25645] (thin): Syncing browser general script, and refinery-source 0.2.45 apparently
17:29 brennen@deploy1003: brennen, bd808: Backport for Revert "Drop writeapi flag from siteinfo API" (T115414 T294397 T371977) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:29 milimetric@deploy1003: Finished deploy [analytics/refinery@0d25645]: Syncing browser general script, and refinery-source 0.2.45 apparently (duration: 54m 21s)
17:27 brennen@deploy1003: Started scap sync-world: Backport for Revert "Drop writeapi flag from siteinfo API" (T115414 T294397 T371977)
17:17 brett: stop pybal on lvs1019 for server reboot
17:14 brett: start pybal on lvs2014
17:11 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2014.codfw.wmnet
17:08 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs2014.codfw.wmnet
17:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1296.eqiad.wmnet with OS bullseye
16:42 brett: stop pybal on lvs2014 for server reboot
16:37 mutante: puppetserver1002 systemctl start dump_ip_reputation
16:34 milimetric@deploy1003: Started deploy [analytics/refinery@0d25645]: Syncing browser general script, and refinery-source 0.2.45 apparently
16:27 brett: start pybal on lvs2013
16:15 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1038.eqiad.wmnet with OS bullseye
16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P67246 and previous config saved to /var/cache/conftool/dbconfig/20240807-161452-ladsgroup.json
16:11 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2013.codfw.wmnet
16:08 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs2013.codfw.wmnet
16:01 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Openjdk upgrade - elukey@cumin1002
15:57 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
15:54 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
15:40 brett: stop pybal on lvs2013 for server reboot
08:18 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.17 refs T366962
07:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add role to mgmt devices - ayounsi@cumin1002"
07:43 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add role to mgmt devices - ayounsi@cumin1002"
03:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1296.eqiad.wmnet with OS bullseye
02:21 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1296.mgmt.eqiad.wmnet with reboot policy FORCED
02:21 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1296.mgmt.eqiad.wmnet with reboot policy FORCED
02:06 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1285.mgmt.eqiad.wmnet with reboot policy FORCED
02:04 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1285.mgmt.eqiad.wmnet with reboot policy FORCED
02:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1288.eqiad.wmnet with OS bullseye
02:02 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
02:02 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1294.eqiad.wmnet with OS bullseye
01:57 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:56 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1289.eqiad.wmnet with OS bullseye
01:55 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:53 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1290.eqiad.wmnet with OS bullseye
01:52 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:51 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1288.eqiad.wmnet with reason: host reimage
01:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1287.eqiad.wmnet with OS bullseye
01:44 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:43 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1296.eqiad.wmnet with OS bullseye
01:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1292.eqiad.wmnet with OS bullseye
01:42 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1293.eqiad.wmnet with OS bullseye
01:40 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:39 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:39 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1294.eqiad.wmnet with reason: host reimage
01:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1291.eqiad.wmnet with OS bullseye
01:36 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1289.eqiad.wmnet with reason: host reimage
01:35 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1295.eqiad.wmnet with OS bullseye
01:33 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1290.eqiad.wmnet with reason: host reimage
01:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:26 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1287.eqiad.wmnet with reason: host reimage
01:24 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1292.eqiad.wmnet with reason: host reimage
01:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1293.eqiad.wmnet with reason: host reimage
01:19 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1286.eqiad.wmnet with OS bullseye
01:19 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1285.eqiad.wmnet with OS bullseye
01:18 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1291.eqiad.wmnet with reason: host reimage
01:15 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1287.eqiad.wmnet with reason: host reimage
01:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1295.eqiad.wmnet with reason: host reimage
01:14 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1294.eqiad.wmnet with reason: host reimage
01:14 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1293.eqiad.wmnet with reason: host reimage
01:13 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1292.eqiad.wmnet with reason: host reimage
01:13 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1291.eqiad.wmnet with reason: host reimage
01:13 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1290.eqiad.wmnet with reason: host reimage
01:12 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1289.eqiad.wmnet with reason: host reimage
01:11 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1288.eqiad.wmnet with reason: host reimage
01:11 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1295.eqiad.wmnet with reason: host reimage
01:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1285.mgmt.eqiad.wmnet with reboot policy FORCED
01:02 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1285.mgmt.eqiad.wmnet with reboot policy FORCED
01:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1286.mgmt.eqiad.wmnet with reboot policy FORCED
01:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1286.mgmt.eqiad.wmnet with reboot policy FORCED
00:58 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1287.eqiad.wmnet with OS bullseye
00:58 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1294.eqiad.wmnet with OS bullseye
00:57 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1293.eqiad.wmnet with OS bullseye
00:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1292.eqiad.wmnet with OS bullseye
00:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1291.eqiad.wmnet with OS bullseye
00:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1290.eqiad.wmnet with OS bullseye
00:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1289.eqiad.wmnet with OS bullseye
00:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1288.eqiad.wmnet with OS bullseye
00:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1295.eqiad.wmnet with OS bullseye
00:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1284.eqiad.wmnet with OS bullseye
00:50 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:48 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1280.eqiad.wmnet with OS bullseye
00:46 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:45 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1281.eqiad.wmnet with OS bullseye
00:44 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:43 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1282.eqiad.wmnet with OS bullseye
00:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:39 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1279.eqiad.wmnet with OS bullseye
00:39 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:38 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1283.eqiad.wmnet with OS bullseye
00:37 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker1284.eqiad.wmnet with reason: host reimage
00:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1280.eqiad.wmnet with reason: host reimage
00:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker1281.eqiad.wmnet with reason: host reimage
00:20 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker1282.eqiad.wmnet with reason: host reimage
00:10 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker1283.eqiad.wmnet with reason: host reimage
00:00 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1279.eqiad.wmnet with reason: host reimage

2024-08-06

23:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1284.eqiad.wmnet with reason: host reimage
23:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1283.eqiad.wmnet with reason: host reimage
23:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1281.eqiad.wmnet with reason: host reimage
23:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1282.eqiad.wmnet with reason: host reimage
23:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1280.eqiad.wmnet with reason: host reimage
23:56 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1279.eqiad.wmnet with reason: host reimage
23:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1274.eqiad.wmnet with OS bullseye
23:49 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1286.eqiad.wmnet with OS bullseye
23:49 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1285.eqiad.wmnet with OS bullseye
23:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1271.eqiad.wmnet with OS bullseye
23:46 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:44 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1272.eqiad.wmnet with OS bullseye
23:43 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:41 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1283.eqiad.wmnet with OS bullseye
23:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1284.eqiad.wmnet with OS bullseye
23:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1282.eqiad.wmnet with OS bullseye
23:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1281.eqiad.wmnet with OS bullseye
23:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1280.eqiad.wmnet with OS bullseye
23:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1279.eqiad.wmnet with OS bullseye
23:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1276.eqiad.wmnet with OS bullseye
23:38 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:37 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1273.eqiad.wmnet with OS bullseye
23:34 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:34 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1277.eqiad.wmnet with OS bullseye
23:33 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1278.eqiad.wmnet with OS bullseye
23:33 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1274.eqiad.wmnet with reason: host reimage
23:28 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1271.eqiad.wmnet with reason: host reimage
23:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1275.eqiad.wmnet with OS bullseye
23:27 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:24 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:24 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1272.eqiad.wmnet with reason: host reimage
23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1270.eqiad.wmnet with OS bullseye
23:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:22 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1276.eqiad.wmnet with reason: host reimage
23:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1271.eqiad.wmnet with reason: host reimage
23:17 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1277.eqiad.wmnet with reason: host reimage
23:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1273.eqiad.wmnet with reason: host reimage
23:10 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1278.eqiad.wmnet with reason: host reimage
23:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1275.eqiad.wmnet with reason: host reimage
23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1270.eqiad.wmnet with reason: host reimage
23:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1278.eqiad.wmnet with reason: host reimage
23:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1277.eqiad.wmnet with reason: host reimage
23:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1276.eqiad.wmnet with reason: host reimage
23:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1275.eqiad.wmnet with reason: host reimage
23:03 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1274.eqiad.wmnet with reason: host reimage
23:03 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1273.eqiad.wmnet with reason: host reimage
23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1272.eqiad.wmnet with reason: host reimage
23:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1271.eqiad.wmnet with OS bullseye
23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1270.eqiad.wmnet with reason: host reimage
23:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1271.mgmt.eqiad.wmnet with reboot policy FORCED
22:48 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1278.eqiad.wmnet with OS bullseye
22:47 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1277.eqiad.wmnet with OS bullseye
22:47 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1276.eqiad.wmnet with OS bullseye
22:47 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1275.eqiad.wmnet with OS bullseye
22:47 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1274.eqiad.wmnet with OS bullseye
22:46 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1273.eqiad.wmnet with OS bullseye
22:46 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1272.eqiad.wmnet with OS bullseye
22:45 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1270.eqiad.wmnet with OS bullseye
22:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1271.mgmt.eqiad.wmnet with reboot policy FORCED
22:41 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1271 - jclark@cumin1002"
22:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1271 - jclark@cumin1002"
22:38 jclark@cumin1002: START - Cookbook sre.dns.netbox
22:34 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:32 jclark@cumin1002: START - Cookbook sre.dns.netbox
21:35 kindrobot: UTC late backport window finished <3
21:34 kindrobot@deploy1003: Finished scap: Backport for Promote dark mode for anons on various wikis - take 2 (T371070 T371084), Enable NetworkSession extension for most wikis (T355267), fix(i18n): adjust broken mentorship eligibility copy (T371775 T370318), fix(i18n): adjust broken mentorship eligibility copy (T371775 T370318) (duration: 47m 05s)
21:25 kindrobot@deploy1003: toyofuku, ebernhardson, kindrobot, migr: Continuing with sync
{{safesubst:SAL entry|1=21:21 kindrobot@deploy1003: toyofuku, ebernhardson, kindrobot, migr: Backport for Promote dark mode for anons on various wikis - take 2 (T371070 T371084), Enable NetworkSession extension for most wikis (T355267), fix(i18n): adjust broken mentorship eligibility copy (T371775 T370318), [[gerrit:1060136|fix(i18n): adjust broken mentorship eligibility copy (T371775 T37031}}
21:21 brett: start pybal on lvs6002
21:18 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6002.drmrs.wmnet
21:16 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs6002.drmrs.wmnet
20:59 brett: stop pybal on lvs6002 for server reboot
20:56 kindrobot: UTC late backport window, deploy is extending beyond deployment window
20:47 kindrobot@deploy1003: Started scap sync-world: Backport for Promote dark mode for anons on various wikis - take 2 (T371070 T371084), Enable NetworkSession extension for most wikis (T355267), fix(i18n): adjust broken mentorship eligibility copy (T371775 T370318), fix(i18n): adjust broken mentorship eligibility copy (T371775 T370318)
20:27 brett: start pybal on lvs4009
20:26 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240729/ using stat1009.eqiad.wmnet)
20:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4009.ulsfo.wmnet
20:21 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs4009.ulsfo.wmnet
19:57 brett: stop pybal on lvs4009 for server reboot
19:49 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240729/ using stat1009.eqiad.wmnet)
19:49 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 02s)
19:49 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
19:44 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
19:44 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
19:42 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 16s)
19:42 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
19:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 18s)
19:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
19:35 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 09s)
19:35 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
19:21 sukhe: start pybal on lvs4008
19:19 sukhe: restart varnishmtail on cp3070
19:16 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4008.ulsfo.wmnet
19:13 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4008.ulsfo.wmnet
19:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve2011
19:03 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve2011
19:03 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve2010
19:03 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve2010
19:03 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve2009
19:03 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve2009
18:57 sukhe: sudo cumin "lvs4008*" 'disable-puppet "rebooting" && systemctl stop pybal.service'
18:49 dancy@deploy1003: Finished scap: testing T371904 (duration: 04m 14s)
18:48 sukhe: re-enable pybal on lvs6001
18:47 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6001.drmrs.wmnet
18:45 dancy@deploy1003: Started scap sync-world: testing T371904
18:44 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs6001.drmrs.wmnet
18:44 dancy@deploy1003: Finished scap: testing T370934 (duration: 31m 05s)
18:41 brett: start pybal on lvs5005
18:36 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5005.eqsin.wmnet
18:33 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs5005.eqsin.wmnet
18:28 sukhe: sudo cumin "lvs6001*" 'disable-puppet "rebooting" && systemctl stop pybal.service'
18:18 brett: stop pybal on lvs5005 for server reboot
18:13 dancy@deploy1003: Started scap sync-world: testing T370934
17:53 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5004.eqsin.wmnet
17:51 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs5004.eqsin.wmnet
17:47 sukhe: stop pybal on lvs5004 for server reboot
17:40 mutante: CI - adding a new SSH key to jenkins - in the same file without removing the old key yet - this is expected to have no effect, but if CI breaks will revert - T177826
17:01 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s5
17:01 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s8
16:56 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1020.eqiad.wmnet with OS bookworm
16:44 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1023.eqiad.wmnet with OS bullseye
16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding payments200 to codfw - jhancock@cumin2002"
16:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding payments200 to codfw - jhancock@cumin2002"
16:35 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:23 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1020.eqiad.wmnet with reason: host reimage
16:21 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1020.eqiad.wmnet with reason: host reimage
16:08 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1020.eqiad.wmnet with OS bookworm
16:08 sukhe: sudo cumin "A:dnsbox" "run-puppet-agent --enable 'upgrading anycast-hc'": finish anycast-hc upgrade: T370068
16:08 sukhe: sudo cumin "A:dnsbox" "run-puppet-agent --enable 'upgrading anycast-hc'": finish anycast-hc upgrade
16:03 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1020.eqiad.wmnet with reason: Reimaging clouddb1020 T365424
16:03 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb1020.eqiad.wmnet with reason: Reimaging clouddb1020 T365424
15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-serve2011 to codfw - jhancock@cumin2002"
15:46 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-serve2011 to codfw - jhancock@cumin2002"
15:41 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:39 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-serve2010 to codfw - jhancock@cumin2002"
15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-serve2010 to codfw - jhancock@cumin2002"
15:35 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:30 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:30 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:26 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:26 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
15:25 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns1006.wikimedia.org [reason: [done] anycast-healthchecker 0.9.8 upgrade]
15:25 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2035.mgmt.codfw.wmnet with reboot policy GRACEFUL
15:23 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1023.eqiad.wmnet with OS bullseye
15:23 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2035.mgmt.codfw.wmnet with reboot policy GRACEFUL
15:23 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns1006.wikimedia.org [reason: anycast-healthchecker 0.9.8 upgrade]
15:21 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns1005.wikimedia.org [reason: [done] anycast-healthchecker 0.9.8 upgrade]
15:20 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2002.mgmt.codfw.wmnet with reboot policy FORCED
15:18 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns1005.wikimedia.org [reason: anycast-healthchecker 0.9.8 upgrade]
15:16 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns1004.wikimedia.org [reason: [done] anycast-healthchecker 0.9.8 upgrade]
15:14 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns1004.wikimedia.org [reason: anycast-healthchecker 0.9.8 upgrade]
15:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2002.mgmt.codfw.wmnet with reboot policy FORCED
15:11 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
15:10 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
15:10 cdanis: re-enabling puppet on cp nodes to deploy https://gerrit.wikimedia.org/r/1059126
15:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1296.mgmt.eqiad.wmnet with reboot policy FORCED
15:01 cdanis: disabling puppet on cp nodes to deploy https://gerrit.wikimedia.org/r/1059126
14:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1295.mgmt.eqiad.wmnet with reboot policy FORCED
14:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1294.mgmt.eqiad.wmnet with reboot policy FORCED
14:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1293.mgmt.eqiad.wmnet with reboot policy FORCED
14:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1291.mgmt.eqiad.wmnet with reboot policy FORCED
14:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1292.mgmt.eqiad.wmnet with reboot policy FORCED
14:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1290.mgmt.eqiad.wmnet with reboot policy FORCED
14:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1289.mgmt.eqiad.wmnet with reboot policy FORCED
14:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1288.mgmt.eqiad.wmnet with reboot policy FORCED
14:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1286.mgmt.eqiad.wmnet with reboot policy FORCED
14:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1287.mgmt.eqiad.wmnet with reboot policy FORCED
14:56 sukhe: disable puppet on A:dnsbox for cluster-wide anycast-hc 0.9.8 upgrade on remaining hosts: T370068
14:55 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns7002.wikimedia.org [reason: [done] anycast-healthchecker 0.9.8 upgrade]
14:53 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns7002.wikimedia.org [reason: anycast-healthchecker 0.9.8 upgrade]
14:53 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns7002.wikimedia.org,service=recdns [reason: anycast-healthchecker 0.9.8 upgrade]
14:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1296.mgmt.eqiad.wmnet with reboot policy FORCED
14:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1273.mgmt.eqiad.wmnet with reboot policy FORCED
14:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1270.mgmt.eqiad.wmnet with reboot policy FORCED
14:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1272.mgmt.eqiad.wmnet with reboot policy FORCED
14:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1274.mgmt.eqiad.wmnet with reboot policy FORCED
14:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1275.mgmt.eqiad.wmnet with reboot policy FORCED
14:41 _joe_: repool cp4044
14:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1295.mgmt.eqiad.wmnet with reboot policy FORCED
14:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1294.mgmt.eqiad.wmnet with reboot policy FORCED
14:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1293.mgmt.eqiad.wmnet with reboot policy FORCED
14:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1292.mgmt.eqiad.wmnet with reboot policy FORCED
14:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1290.mgmt.eqiad.wmnet with reboot policy FORCED
14:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1289.mgmt.eqiad.wmnet with reboot policy FORCED
14:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1288.mgmt.eqiad.wmnet with reboot policy FORCED
14:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1286.mgmt.eqiad.wmnet with reboot policy FORCED
14:37 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1291.mgmt.eqiad.wmnet with reboot policy FORCED
14:37 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1287.mgmt.eqiad.wmnet with reboot policy FORCED
14:37 zabe@deploy1003: Finished scap: update interwiki cache (duration: 07m 10s)
14:36 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp3081*} and A:cp for 9.2.5-1wm2
14:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1285.mgmt.eqiad.wmnet with reboot policy FORCED
14:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1278.mgmt.eqiad.wmnet with reboot policy FORCED
14:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1282.mgmt.eqiad.wmnet with reboot policy FORCED
14:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1280.mgmt.eqiad.wmnet with reboot policy FORCED
14:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1284.mgmt.eqiad.wmnet with reboot policy FORCED
14:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1283.mgmt.eqiad.wmnet with reboot policy FORCED
14:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1281.mgmt.eqiad.wmnet with reboot policy FORCED
14:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1279.mgmt.eqiad.wmnet with reboot policy FORCED
14:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1277.mgmt.eqiad.wmnet with reboot policy FORCED
14:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1276.mgmt.eqiad.wmnet with reboot policy FORCED
14:33 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp3081*} and A:cp for 9.2.5-1wm2
14:30 zabe@deploy1003: Started scap sync-world: update interwiki cache
14:29 ChrisDobbins901_: cdobbins@cumin1002:~$ sudo cumin 'A:cp' 'run-puppet-agent --enable "merging CR #1059123"'
14:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1020.eqiad.wmnet with reason: Reimaging clouddb1020 T365424
14:28 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb1020.eqiad.wmnet with reason: Reimaging clouddb1020 T365424
14:25 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=bdrwiki --cluster=all 2>&1 | tee /tmp/bdrwiki.UpdateSearchIndexConfig.log # T371757
14:24 zabe@deploy1003: Finished scap: Creating bdrwiki (T371757) (duration: 06m 43s)
14:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host vrts2002.codfw.wmnet with OS bookworm
14:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:23 elukey: upgrade debmonitor-server on debmonitor[1,2]003 to version 0.5 - T368744
14:21 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:21 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1273.mgmt.eqiad.wmnet with reboot policy FORCED
14:21 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1273.mgmt.eqiad.wmnet with reboot policy FORCED
14:20 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1272.mgmt.eqiad.wmnet with reboot policy FORCED
14:20 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1273.mgmt.eqiad.wmnet with reboot policy FORCED
14:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1274.mgmt.eqiad.wmnet with reboot policy FORCED
14:18 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1270.mgmt.eqiad.wmnet with reboot policy FORCED
14:18 zabe@deploy1003: Started scap sync-world: Creating bdrwiki (T371757)
14:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1270.mgmt.eqiad.wmnet with reboot policy FORCED
14:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1274.mgmt.eqiad.wmnet with reboot policy FORCED
14:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1273.mgmt.eqiad.wmnet with reboot policy FORCED
14:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1272.mgmt.eqiad.wmnet with reboot policy FORCED
14:17 zabe: Create Wikipedia West Coast Bajau # T371757
14:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1270.mgmt.eqiad.wmnet with reboot policy FORCED
14:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1275.mgmt.eqiad.wmnet with reboot policy FORCED
14:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1274.mgmt.eqiad.wmnet with reboot policy FORCED
14:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1273.mgmt.eqiad.wmnet with reboot policy FORCED
14:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1272.mgmt.eqiad.wmnet with reboot policy FORCED
14:14 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-serve2009 to codfw - jhancock@cumin2002"
14:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-serve2009 to codfw - jhancock@cumin2002"
14:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1285.mgmt.eqiad.wmnet with reboot policy FORCED
14:12 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1285.mgmt.eqiad.wmnet with reboot policy FORCED
14:12 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1278.mgmt.eqiad.wmnet with reboot policy FORCED
14:12 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1283.mgmt.eqiad.wmnet with reboot policy FORCED
14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1280.mgmt.eqiad.wmnet with reboot policy FORCED
14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1277.mgmt.eqiad.wmnet with reboot policy FORCED
14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1284.mgmt.eqiad.wmnet with reboot policy FORCED
14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1285.mgmt.eqiad.wmnet with reboot policy FORCED
14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1282.mgmt.eqiad.wmnet with reboot policy FORCED
14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1281.mgmt.eqiad.wmnet with reboot policy FORCED
14:11 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1279.mgmt.eqiad.wmnet with reboot policy FORCED
14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1276.mgmt.eqiad.wmnet with reboot policy FORCED
14:08 zabe@deploy1003: Finished scap: Backport for TranslatablePage: Use local cache to reduce calls to the WAN cache (T366455), Fix test that only works in June or July (T371577), TranslatablePage: Use local cache to reduce calls to the WAN cache (T366455) (duration: 13m 22s)
14:07 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:07 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker12 - jclark@cumin1002"
14:07 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker12 - jclark@cumin1002"
14:03 zabe@deploy1003: abi, zabe: Continuing with sync
14:03 jclark@cumin1002: START - Cookbook sre.dns.netbox
14:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts2002.codfw.wmnet with reason: host reimage
14:02 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
14:01 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s8
14:01 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s5
14:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts2002.codfw.wmnet with reason: host reimage
14:00 jclark@cumin1002: START - Cookbook sre.dns.netbox
13:56 zabe@deploy1003: abi, zabe: Backport for TranslatablePage: Use local cache to reduce calls to the WAN cache (T366455), Fix test that only works in June or July (T371577), TranslatablePage: Use local cache to reduce calls to the WAN cache (T366455) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:54 zabe@deploy1003: Started scap sync-world: Backport for TranslatablePage: Use local cache to reduce calls to the WAN cache (T366455), Fix test that only works in June or July (T371577), TranslatablePage: Use local cache to reduce calls to the WAN cache (T366455)
13:54 sukhe: upgrading A:wikidough to pdns-rec 4.8.8
13:53 ChrisDobbins901_: cdobbins@cumin1002:~$ sudo cumin 'A:cp' 'disable-puppet "merging CR #1059123"'
13:51 zabe@deploy1003: Finished scap: T371060 (duration: 07m 57s)
13:43 zabe@deploy1003: Started scap sync-world: T371060
13:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host vrts2002.codfw.wmnet with OS bookworm
13:28 zabe@deploy1003: Finished scap: Backport for mywikisource: add portal, author and translation namespaces (T371060), dtpwiki: add timezone (T371076) (duration: 11m 28s)
13:24 zabe@deploy1003: anzx, zabe: Continuing with sync
13:20 zabe@deploy1003: anzx, zabe: Backport for mywikisource: add portal, author and translation namespaces (T371060), dtpwiki: add timezone (T371076) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:17 zabe@deploy1003: Started scap sync-world: Backport for mywikisource: add portal, author and translation namespaces (T371060), dtpwiki: add timezone (T371076)
13:14 zabe@deploy1003: Finished scap: Backport for group0, frwiki, itwiki: enable shellbox-video (T356241), [Growth] enwiki: Enable frontend for Add Link (T370802) (duration: 10m 41s)
13:13 _joe_: depooling cp4044 from traffic to apply new tls termination templates
13:09 zabe@deploy1003: hnowlan, urbanecm, zabe: Continuing with sync
13:08 zabe@deploy1003: hnowlan, urbanecm, zabe: Backport for group0, frwiki, itwiki: enable shellbox-video (T356241), [Growth] enwiki: Enable frontend for Add Link (T370802) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:03 zabe@deploy1003: Started scap sync-world: Backport for group0, frwiki, itwiki: enable shellbox-video (T356241), [Growth] enwiki: Enable frontend for Add Link (T370802)
12:58 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:58 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:39 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:39 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:39 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Openjdk upgrade - elukey@cumin1002
12:32 elukey: apt-get purge debmonitor-server + run-puppet-agent to re-install the daemon on debmonitor2003
12:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on debmonitor2003.codfw.wmnet with reason: failover test
12:31 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on debmonitor2003.codfw.wmnet with reason: failover test
12:21 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Openjdk upgrade - elukey@cumin1002
12:16 elukey: restart debmonitor-server on debmonitor1003
12:13 elukey: stop debmonitor-server on debmonitor1003 as temporary test
12:11 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on debmonitor1003.eqiad.wmnet with reason: failover test
12:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on debmonitor1003.eqiad.wmnet with reason: failover test
10:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T367856)', diff saved to https://phabricator.wikimedia.org/P67232 and previous config saved to /var/cache/conftool/dbconfig/20240806-100756-marostegui.json
10:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1218.eqiad.wmnet with reason: Maintenance
10:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1218.eqiad.wmnet with reason: Maintenance
10:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T367856)', diff saved to https://phabricator.wikimedia.org/P67231 and previous config saved to /var/cache/conftool/dbconfig/20240806-100734-marostegui.json
09:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P67229 and previous config saved to /var/cache/conftool/dbconfig/20240806-095226-marostegui.json
09:41 joe: upgrading conftool to 3.2.1 everywhere T369606
09:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P67228 and previous config saved to /var/cache/conftool/dbconfig/20240806-093719-marostegui.json
09:24 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Openjdk upgrade - elukey@cumin1002
09:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T367856)', diff saved to https://phabricator.wikimedia.org/P67227 and previous config saved to /var/cache/conftool/dbconfig/20240806-092212-marostegui.json
09:07 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Openjdk upgrade - elukey@cumin1002
09:02 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Openjdk upgrade - elukey@cumin1002
08:43 topranks: shutting cloudsw1-d5-eqiad <-> cloudsw1-e4-eqiad link
08:42 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Openjdk upgrade - elukey@cumin1002
08:16 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.17 refs T366962
08:16 elukey: powercycle wdqs1023, misbehaving and not responding to ssh anymore
08:12 elukey@puppetserver1001: conftool action : set/pooled=no; selector: name=wdqs1023.eqiad.wmnet
07:50 kart_: Updated cxserver to 2024-08-05-063332-production (T371760, T357950)
07:49 oblivian@puppetserver1002: conftool action : set/weight=10; selector: cluster=videoscaler,name=mw1407.eqiad.wmnet
07:49 oblivian@puppetserver1002: conftool action : set/weight=1; selector: cluster=videoscaler,name=mw1407.eqiad.wmnet
07:46 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
07:45 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
07:44 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
07:44 _joe_: uploaded conftool 3.2.1 to apt.wikimedia.org
07:43 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
07:42 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
07:42 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
07:40 kart_: Updated MinT to 2024-08-05-062247-production (T363308, T355304, T368521)
07:37 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
07:34 elukey: powercycle ml-serve2001 - host seems frozen, DIMM errors registered in `getsel`
07:28 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
07:16 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 142108
07:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262725
07:15 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 262725
07:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61928
07:15 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61928
07:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 265158
07:14 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 265158
07:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 264014
07:14 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 264014
07:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61642
07:13 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61642
07:13 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
07:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15169
07:05 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
07:03 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
06:58 kartik@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: apply
06:50 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 15169
05:41 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240729/ using stat1009.eqiad.wmnet)
04:39 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240729/ using stat1009.eqiad.wmnet)
04:38 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1021.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240729/ using stat1009.eqiad.wmnet)
04:37 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 09s)
04:37 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
04:36 ryankemper@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1023.eqiad.wmnet with OS bullseye
04:36 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host (duration: 00m 09s)
04:36 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: deploy to freshly reimaged host
04:01 mwpresync@deploy1003: Pruned MediaWiki: 1.43.0-wmf.14 (duration: 00m 58s)
03:47 mwpresync@deploy1003: Finished scap: testwikis to 1.43.0-wmf.17 refs T366962 (duration: 45m 05s)
03:02 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.43.0-wmf.17 refs T366962

2024-08-05

20:47 cjming: end of UTC late backport window
20:44 cjming@deploy1003: Finished scap: Backport for Add wikibase client interaction stream to Event Logging (T370045) (duration: 22m 52s)
20:39 cjming@deploy1003: cjming, joelyrookewmde: Continuing with sync
20:23 cjming@deploy1003: cjming, joelyrookewmde: Backport for Add wikibase client interaction stream to Event Logging (T370045) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:21 cjming@deploy1003: Started scap sync-world: Backport for Add wikibase client interaction stream to Event Logging (T370045)
19:29 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_main on wdqs1021.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240729/ using stat1009.eqiad.wmnet)
19:14 otto@deploy1003: Finished scap: Backport for eventbus: enable instrumentation on all wikis (T363587) (duration: 07m 08s)
19:10 otto@deploy1003: otto: Continuing with sync
19:09 otto@deploy1003: otto: Backport for eventbus: enable instrumentation on all wikis (T363587) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:07 otto@deploy1003: Started scap sync-world: Backport for eventbus: enable instrumentation on all wikis (T363587)
18:56 dancy@deploy1003: sync-world aborted: testing scap 4.96.0 (duration: 03m 11s)
18:53 dancy@deploy1003: Started scap sync-world: testing scap 4.96.0
18:52 dancy@deploy1003: Installation of scap version "4.96.0" completed for 211 hosts
18:52 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1021.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240729/ using stat1009.eqiad.wmnet)
18:52 dancy@deploy1003: Installing scap version "4.96.0" for 211 hosts
18:27 dancy@deploy1003: Started scap sync-world: testing updates to repos/releng/release/make-container-image
17:28 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1023.eqiad.wmnet with reason: host reimage
17:25 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1023.eqiad.wmnet with reason: host reimage
17:04 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1023.eqiad.wmnet with OS bullseye
16:52 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:52 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:52 mutante: DNS - added new project language 'bdr' - West Coast Bajau - https://en.wikipedia.org/wiki/Sama%E2%80%93Bajaw_languages - T371757
16:36 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2008.codfw.wmnet with OS bookworm
16:33 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2007.codfw.wmnet with OS bookworm
16:20 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:19 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
16:18 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus2008.codfw.wmnet with reason: host reimage
16:15 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
16:12 filippo@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus2008.codfw.wmnet with reason: host reimage
16:11 filippo@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
16:11 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:10 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:53 filippo@cumin1002: START - Cookbook sre.hosts.reimage for host prometheus2008.codfw.wmnet with OS bookworm
15:42 filippo@cumin1002: START - Cookbook sre.hosts.reimage for host prometheus2007.codfw.wmnet with OS bookworm
15:41 filippo@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host prometheus2007.codfw.wmnet with OS bookworm
15:40 filippo@cumin1002: START - Cookbook sre.hosts.reimage for host prometheus2007.codfw.wmnet with OS bookworm
15:39 filippo@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host prometheus2007.codfw.wmnet with OS bookworm
15:39 elukey@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
15:38 elukey@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
15:29 filippo@cumin1002: START - Cookbook sre.hosts.reimage for host prometheus2007.codfw.wmnet with OS bookworm
15:27 elukey@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
15:26 elukey@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
15:22 elukey@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
15:22 elukey@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
15:16 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1019.eqiad.wmnet with OS bookworm
15:15 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s4
15:15 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s6
15:07 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2239.mgmt.codfw.wmnet with reboot policy GRACEFUL
15:03 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2239.mgmt.codfw.wmnet with reboot policy GRACEFUL
15:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2240.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:52 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
14:52 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
14:49 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2240.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2238.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:43 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1006.eqiad.wmnet
14:36 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
14:35 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
14:35 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2238.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:35 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1006.eqiad.wmnet
14:25 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
14:25 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
14:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:22 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:20 filippo@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host prometheus2007.codfw.wmnet with OS bookworm
14:18 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1003.eqiad.wmnet
14:11 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1003.eqiad.wmnet
14:04 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2237.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:02 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1019.eqiad.wmnet with reason: host reimage
14:01 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2237.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:01 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2236.mgmt.codfw.wmnet with reboot policy GRACEFUL
13:59 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1019.eqiad.wmnet with reason: host reimage
13:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2236.mgmt.codfw.wmnet with reboot policy GRACEFUL
13:44 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1019.eqiad.wmnet with OS bookworm
13:39 filippo@cumin1002: START - Cookbook sre.hosts.reimage for host prometheus2007.codfw.wmnet with OS bookworm
13:20 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-conf1006.eqiad.wmnet with OS bookworm
13:07 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-conf1006.eqiad.wmnet with reason: host reimage
13:04 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1007.eqiad.wmnet
13:03 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-conf1006.eqiad.wmnet with reason: host reimage
13:00 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-tool1007.eqiad.wmnet
12:58 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1008.eqiad.wmnet
12:57 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-web1001.eqiad.wmnet
12:57 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1011.eqiad.wmnet
12:55 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-tool1008.eqiad.wmnet
12:53 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-tool1011.eqiad.wmnet
12:52 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1006.eqiad.wmnet with OS bookworm
12:52 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-web1001.eqiad.wmnet
12:11 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-conf1005.eqiad.wmnet with OS bookworm
11:56 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-conf1005.eqiad.wmnet with reason: host reimage
11:53 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-conf1005.eqiad.wmnet with reason: host reimage
11:42 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
11:42 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1005.eqiad.wmnet with OS bookworm
11:34 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
11:33 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1005.eqiad.wmnet
11:27 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1005.eqiad.wmnet
11:22 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1004.eqiad.wmnet
11:20 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1001.eqiad.wmnet
11:18 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary and A:netbox-all
11:17 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-conf1004.eqiad.wmnet with OS bookworm
11:16 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1004.eqiad.wmnet
11:12 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1016.eqiad.wmnet
11:12 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary and A:netbox-all
11:11 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-coord1001.eqiad.wmnet
11:10 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-client1002.eqiad.wmnet
11:06 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host snapshot1016.eqiad.wmnet
11:06 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1015.eqiad.wmnet
11:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1002.eqiad.wmnet
11:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T367856)', diff saved to https://phabricator.wikimedia.org/P67222 and previous config saved to /var/cache/conftool/dbconfig/20240805-110512-marostegui.json
11:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1207.eqiad.wmnet with reason: Maintenance
11:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1207.eqiad.wmnet with reason: Maintenance
11:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T367856)', diff saved to https://phabricator.wikimedia.org/P67221 and previous config saved to /var/cache/conftool/dbconfig/20240805-110450-marostegui.json
11:04 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-client1002.eqiad.wmnet
11:03 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-conf1004.eqiad.wmnet with reason: host reimage
11:00 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-conf1004.eqiad.wmnet with reason: host reimage
11:00 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host snapshot1015.eqiad.wmnet
11:00 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1014.eqiad.wmnet
10:59 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1002.eqiad.wmnet
10:53 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host snapshot1014.eqiad.wmnet
10:50 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1004.eqiad.wmnet with OS bookworm
10:49 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1001.eqiad.wmnet
10:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P67220 and previous config saved to /var/cache/conftool/dbconfig/20240805-104943-marostegui.json
10:49 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-conf1004.eqiad.wmnet with OS bookworm
10:43 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-master1001.eqiad.wmnet
10:40 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
10:37 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1001.eqiad.wmnet
10:36 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts clouddb1021.eqiad.wmnet
10:36 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:36 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
10:35 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
10:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P67219 and previous config saved to /var/cache/conftool/dbconfig/20240805-103437-marostegui.json
10:34 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet
10:31 btullis@cumin1002: START - Cookbook sre.dns.netbox
10:30 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1001.eqiad.wmnet
10:24 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts clouddb1021.eqiad.wmnet
10:22 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@537b288]: (no justification provided) (duration: 00m 36s)
10:22 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@537b288]: (no justification provided)
10:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T367856)', diff saved to https://phabricator.wikimedia.org/P67218 and previous config saved to /var/cache/conftool/dbconfig/20240805-101930-marostegui.json
09:52 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:48 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
09:48 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
09:44 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
09:40 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
09:39 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
09:38 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
09:38 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
09:36 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
09:35 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1004.eqiad.wmnet with OS bookworm
09:35 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddumps1002.wikimedia.org
09:27 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddumps1002.wikimedia.org
09:24 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddumps1001.wikimedia.org
09:16 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddumps1001.wikimedia.org
08:30 zabe@deploy1003: Finished scap: Backport for noc: Provide db-sections.php (duration: 22m 04s)
08:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox2003.codfw.wmnet
08:28 ayounsi@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netbox2003.codfw.wmnet
08:20 zabe@deploy1003: zabe: Continuing with sync
08:20 zabe@deploy1003: zabe: Backport for noc: Provide db-sections.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:20 vgutierrez: manually removing wmf_auto_restart_benthos@haproxy_cache.service on cp4037 - T370741
08:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox1003.eqiad.wmnet
08:11 ayounsi@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netbox1003.eqiad.wmnet
08:11 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
08:08 zabe@deploy1003: Started scap sync-world: Backport for noc: Provide db-sections.php
08:02 zabe: zabe@mwmaint1002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=loginwiki --logwiki=metawiki 'Lirielmartinss' 'Ligg89' # T371784
08:01 zabe: zabe@mwmaint1002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=loginwiki --logwiki=metawiki "It'sMogli" 'ItsMogli' # T371784
06:55 XioNoX: push `LVS-service-ips` rename to ssw1-d8-codfw
06:25 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox1003.eqiad.wmnet

2024-08-04

15:44 mnz@deploy1003: Finished deploy [airflow-dags/research@d573c40]: (no justification provided) (duration: 00m 31s)
15:44 mnz@deploy1003: Started deploy [airflow-dags/research@d573c40]: (no justification provided)
11:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T367856)', diff saved to https://phabricator.wikimedia.org/P67217 and previous config saved to /var/cache/conftool/dbconfig/20240804-113742-marostegui.json
11:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1206.eqiad.wmnet with reason: Maintenance
11:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1206.eqiad.wmnet with reason: Maintenance
11:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T367856)', diff saved to https://phabricator.wikimedia.org/P67216 and previous config saved to /var/cache/conftool/dbconfig/20240804-113720-marostegui.json
11:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P67215 and previous config saved to /var/cache/conftool/dbconfig/20240804-112213-marostegui.json
11:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P67214 and previous config saved to /var/cache/conftool/dbconfig/20240804-110706-marostegui.json
10:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T367856)', diff saved to https://phabricator.wikimedia.org/P67213 and previous config saved to /var/cache/conftool/dbconfig/20240804-105159-marostegui.json
05:54 ryankemper: [WDQS] Restart wdqs2010 to fix free allocators error

2024-08-03

16:53 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1022.eqiad.wmnet with OS bullseye
16:15 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1021.eqiad.wmnet with OS bullseye
10:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T367856)', diff saved to https://phabricator.wikimedia.org/P67212 and previous config saved to /var/cache/conftool/dbconfig/20240803-100308-marostegui.json
10:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
10:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
10:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1196.eqiad.wmnet with reason: Maintenance
10:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1196.eqiad.wmnet with reason: Maintenance
10:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T367856)', diff saved to https://phabricator.wikimedia.org/P67211 and previous config saved to /var/cache/conftool/dbconfig/20240803-100228-marostegui.json
09:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P67210 and previous config saved to /var/cache/conftool/dbconfig/20240803-094721-marostegui.json
09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P67209 and previous config saved to /var/cache/conftool/dbconfig/20240803-093214-marostegui.json
09:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T367856)', diff saved to https://phabricator.wikimedia.org/P67208 and previous config saved to /var/cache/conftool/dbconfig/20240803-091707-marostegui.json
03:09 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1260.eqiad.wmnet with OS bullseye
02:50 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1023.eqiad.wmnet with OS bullseye
02:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1269.eqiad.wmnet with OS bullseye
02:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
02:21 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
02:15 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1266.eqiad.wmnet with OS bullseye
02:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1269.eqiad.wmnet with reason: host reimage
02:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1269.eqiad.wmnet with reason: host reimage
01:53 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1022.eqiad.wmnet with reason: host reimage
01:50 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1022.eqiad.wmnet with reason: host reimage
01:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1260.eqiad.wmnet with OS bullseye
01:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1268.eqiad.wmnet with OS bullseye
01:48 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:48 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:46 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1269.eqiad.wmnet with OS bullseye
01:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1267.eqiad.wmnet with OS bullseye
01:45 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:45 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:37 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1260.eqiad.wmnet with OS bullseye
01:30 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1268.eqiad.wmnet with reason: host reimage
01:29 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1023.eqiad.wmnet with OS bullseye
01:28 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1022.eqiad.wmnet with OS bullseye
01:28 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1268.eqiad.wmnet with reason: host reimage
01:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1267.eqiad.wmnet with reason: host reimage
01:25 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1267.eqiad.wmnet with reason: host reimage
01:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on 9 hosts with reason: T364368 rejiggering hosts
01:15 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on 9 hosts with reason: T364368 rejiggering hosts
01:14 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1021.eqiad.wmnet with reason: host reimage
01:12 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1021.eqiad.wmnet with reason: host reimage
01:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1268.eqiad.wmnet with OS bullseye
01:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1265.eqiad.wmnet with OS bullseye
01:11 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:10 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:09 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1267.eqiad.wmnet with OS bullseye
01:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1264.eqiad.wmnet with OS bullseye
01:08 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1263.eqiad.wmnet with OS bullseye
01:05 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1262.eqiad.wmnet with OS bullseye
00:57 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:57 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1266.eqiad.wmnet with OS bullseye
00:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1261.eqiad.wmnet with OS bullseye
00:55 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:54 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on wdqs[2021-2022,2024-2025].codfw.wmnet with reason: T364368 rejiggering hosts
00:54 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on wdqs[2021-2022,2024-2025].codfw.wmnet with reason: T364368 rejiggering hosts
00:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1265.eqiad.wmnet with reason: host reimage
00:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1264.eqiad.wmnet with reason: host reimage
00:50 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1265.eqiad.wmnet with reason: host reimage
00:49 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
00:47 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1264.eqiad.wmnet with reason: host reimage
00:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1263.eqiad.wmnet with reason: host reimage
00:44 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1263.eqiad.wmnet with reason: host reimage
00:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1262.eqiad.wmnet with reason: host reimage
00:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1261.eqiad.wmnet with reason: host reimage
00:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1262.eqiad.wmnet with reason: host reimage
00:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1261.eqiad.wmnet with reason: host reimage
00:33 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1265.eqiad.wmnet with OS bullseye
00:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1258.eqiad.wmnet with OS bullseye
00:33 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:32 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1264.eqiad.wmnet with OS bullseye
00:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1259.eqiad.wmnet with OS bullseye
00:29 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:29 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:27 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1263.eqiad.wmnet with OS bullseye
00:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1258.eqiad.wmnet with reason: host reimage
00:19 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1259.eqiad.wmnet with reason: host reimage
00:18 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1262.eqiad.wmnet with OS bullseye
00:18 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1261.eqiad.wmnet with OS bullseye
00:17 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1256.eqiad.wmnet with OS bullseye
00:17 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:17 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1258.eqiad.wmnet with reason: host reimage
00:17 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:17 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1259.eqiad.wmnet with reason: host reimage
00:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1260.eqiad.wmnet with OS bullseye
00:14 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1259.eqiad.wmnet with OS bullseye
00:14 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1258.eqiad.wmnet with OS bullseye
00:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1254.eqiad.wmnet with OS bullseye
00:13 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:13 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:07 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1255.eqiad.wmnet with OS bullseye
00:07 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:07 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1256.eqiad.wmnet with reason: host reimage
00:06 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1257.eqiad.wmnet with OS bullseye
00:05 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1254.eqiad.wmnet with reason: host reimage

2024-08-02

23:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1255.eqiad.wmnet with reason: host reimage
23:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1257.eqiad.wmnet with reason: host reimage
23:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1255.eqiad.wmnet with reason: host reimage
23:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1256.eqiad.wmnet with reason: host reimage
23:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1254.eqiad.wmnet with reason: host reimage
23:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1257.eqiad.wmnet with reason: host reimage
23:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1255.eqiad.wmnet with OS bullseye
23:48 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1256.eqiad.wmnet with OS bullseye
23:48 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1254.eqiad.wmnet with OS bullseye
23:48 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1257.eqiad.wmnet with OS bullseye
23:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1253.eqiad.wmnet with OS bullseye
23:46 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:45 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1252.eqiad.wmnet with OS bullseye
23:44 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:44 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1251.eqiad.wmnet with OS bullseye
23:40 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:40 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1250.eqiad.wmnet with OS bullseye
23:36 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:36 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1253.eqiad.wmnet with reason: host reimage
23:33 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1253.eqiad.wmnet with reason: host reimage
23:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1252.eqiad.wmnet with reason: host reimage
23:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1253.eqiad.wmnet with OS bullseye
23:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1251.eqiad.wmnet with reason: host reimage
23:29 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1252.eqiad.wmnet with reason: host reimage
23:26 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1251.eqiad.wmnet with reason: host reimage
23:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1252.eqiad.wmnet with OS bullseye
23:26 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1250.eqiad.wmnet with reason: host reimage
23:24 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1251.eqiad.wmnet with OS bullseye
23:24 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1250.eqiad.wmnet with reason: host reimage
23:21 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1250.eqiad.wmnet with OS bullseye
23:19 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
22:48 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
22:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
22:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
21:55 ejegg: standalone (IPN listener) SmashPig upgraded from 1b2d9a6e to 5e784691
16:01 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@d573c40]: Deploy latest DAGs for analytics Airflow instance. T368756 (duration: 01m 02s)
16:00 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@d573c40]: Deploy latest DAGs for analytics Airflow instance. T368756
15:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2235.mgmt.codfw.wmnet with reboot policy GRACEFUL
15:05 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2235.mgmt.codfw.wmnet with reboot policy GRACEFUL
15:00 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2234.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:53 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2234.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2233.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:49 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2233.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2232.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2232.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:34 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2231.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2231.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2008.codfw.wmnet with OS bookworm
13:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus2008.codfw.wmnet with reason: host reimage
13:52 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus2008.codfw.wmnet with reason: host reimage
13:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2008.codfw.wmnet with OS bookworm
13:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2007.codfw.wmnet with OS bookworm
13:44 sukhe: running authdns-update for CR: 1059362 T371304
13:44 sukhe: running authdns-update for CR: T3713041059362
13:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
13:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host alert2002.wikimedia.org with OS bookworm
13:35 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
13:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2007.codfw.wmnet with OS bookworm
13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on alert2002.wikimedia.org with reason: host reimage
13:24 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on alert2002.wikimedia.org with reason: host reimage
13:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host alert2002.wikimedia.org with OS bookworm
13:11 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "prometheus - ayounsi@cumin1002"
13:10 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "prometheus - ayounsi@cumin1002"
11:03 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
11:03 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:55 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:23 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "alert2002 - ayounsi@cumin1002"
10:18 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "alert2002 - ayounsi@cumin1002"
10:18 elukey: manually start dump_cloud_ip_ranges.service on puppetmaster1001 as test
10:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:14 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:09 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T367856)', diff saved to https://phabricator.wikimedia.org/P67203 and previous config saved to /var/cache/conftool/dbconfig/20240802-090649-marostegui.json
09:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1195.eqiad.wmnet with reason: Maintenance
09:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1195.eqiad.wmnet with reason: Maintenance
09:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T367856)', diff saved to https://phabricator.wikimedia.org/P67202 and previous config saved to /var/cache/conftool/dbconfig/20240802-090627-marostegui.json
08:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P67201 and previous config saved to /var/cache/conftool/dbconfig/20240802-085119-marostegui.json
08:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P67200 and previous config saved to /var/cache/conftool/dbconfig/20240802-083612-marostegui.json
08:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T367856)', diff saved to https://phabricator.wikimedia.org/P67199 and previous config saved to /var/cache/conftool/dbconfig/20240802-082105-marostegui.json
08:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
08:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
08:19 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
08:19 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
07:37 slyngshede@cumin1002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Sbailey out of all services on: 2241 hosts
07:36 slyngshede@cumin1002: START - Cookbook sre.idm.logout Logging Sbailey out of all services on: 2241 hosts
02:09 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1260
02:08 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1260
02:07 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1267.mgmt.eqiad.wmnet with reboot policy FORCED
02:07 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
02:07 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1261.mgmt.eqiad.wmnet with reboot policy FORCED
02:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1266.mgmt.eqiad.wmnet with reboot policy FORCED
02:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1269.mgmt.eqiad.wmnet with reboot policy FORCED
02:03 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
02:01 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
01:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1263.mgmt.eqiad.wmnet with reboot policy FORCED
01:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1268.mgmt.eqiad.wmnet with reboot policy FORCED
01:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1262.mgmt.eqiad.wmnet with reboot policy FORCED
01:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1265.mgmt.eqiad.wmnet with reboot policy FORCED
01:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1264.mgmt.eqiad.wmnet with reboot policy FORCED
01:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1267.mgmt.eqiad.wmnet with reboot policy FORCED
01:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1268.mgmt.eqiad.wmnet with reboot policy FORCED
01:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1269.mgmt.eqiad.wmnet with reboot policy FORCED
01:37 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1266.mgmt.eqiad.wmnet with reboot policy FORCED
01:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1264.mgmt.eqiad.wmnet with reboot policy FORCED
01:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1265.mgmt.eqiad.wmnet with reboot policy FORCED
01:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1261.mgmt.eqiad.wmnet with reboot policy FORCED
01:30 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
01:28 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1263.mgmt.eqiad.wmnet with reboot policy FORCED
01:27 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1261.mgmt.eqiad.wmnet with reboot policy FORCED
01:27 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
01:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1262.mgmt.eqiad.wmnet with reboot policy FORCED
01:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1261.mgmt.eqiad.wmnet with reboot policy FORCED
01:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1260.mgmt.eqiad.wmnet with reboot policy FORCED
01:25 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:25 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1260-9 - jclark@cumin1002"
01:25 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1260-9 - jclark@cumin1002"
01:22 jclark@cumin1002: START - Cookbook sre.dns.netbox
01:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1250.eqiad.wmnet with OS bullseye
00:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1250.eqiad.wmnet with reason: host reimage
00:57 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1257.eqiad.wmnet with OS bullseye
00:55 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1250.eqiad.wmnet with reason: host reimage
00:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1252.eqiad.wmnet with OS bullseye
00:53 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1250.eqiad.wmnet with OS bullseye
00:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1250.mgmt.eqiad.wmnet with reboot policy FORCED
00:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1259.eqiad.wmnet with OS bullseye
00:50 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1250.mgmt.eqiad.wmnet with reboot policy FORCED
00:48 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1253.eqiad.wmnet with OS bullseye
00:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1258.eqiad.wmnet with OS bullseye
00:43 zabe@deploy1003: Finished scap: Backport for Further configurations for u4cwiki (T371452) (duration: 07m 24s)
00:41 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1254.eqiad.wmnet with OS bullseye
00:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1257.eqiad.wmnet with reason: host reimage
00:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1255.eqiad.wmnet with OS bullseye
00:39 zabe@deploy1003: zabe: Continuing with sync
00:38 zabe@deploy1003: zabe: Backport for Further configurations for u4cwiki (T371452) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
00:38 zabe: zabe@mwmaint1002:~$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php u4cwiki translate # T371452
00:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1252.eqiad.wmnet with reason: host reimage
00:36 zabe@deploy1003: Started scap sync-world: Backport for Further configurations for u4cwiki (T371452)
00:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1256.eqiad.wmnet with OS bullseye
00:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1259.eqiad.wmnet with reason: host reimage
00:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1253.eqiad.wmnet with reason: host reimage
00:30 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1250.eqiad.wmnet with OS bullseye
00:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host alert2002.wikimedia.org with OS bookworm
00:28 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1251.eqiad.wmnet with OS bullseye
00:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1258.eqiad.wmnet with reason: host reimage
00:24 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1254.eqiad.wmnet with reason: host reimage
00:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1255.eqiad.wmnet with reason: host reimage
00:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on alert2002.wikimedia.org with reason: host reimage
00:17 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1256.eqiad.wmnet with reason: host reimage
00:15 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on alert2002.wikimedia.org with reason: host reimage
00:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1250.eqiad.wmnet with reason: host reimage
00:13 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host alert2002.wikimedia.org with OS bookworm
00:11 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1255.eqiad.wmnet with reason: host reimage
00:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1251.eqiad.wmnet with reason: host reimage
00:11 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1256.eqiad.wmnet with reason: host reimage
00:10 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1258.eqiad.wmnet with reason: host reimage
00:10 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1259.eqiad.wmnet with reason: host reimage
00:09 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1257.eqiad.wmnet with reason: host reimage
00:09 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1254.eqiad.wmnet with reason: host reimage
00:09 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1252.eqiad.wmnet with reason: host reimage
00:08 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1253.eqiad.wmnet with reason: host reimage
00:08 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1250.eqiad.wmnet with reason: host reimage
00:08 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1251.eqiad.wmnet with reason: host reimage

2024-08-01

23:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1255.eqiad.wmnet with OS bullseye
23:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1256.eqiad.wmnet with OS bullseye
23:53 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1259.eqiad.wmnet with OS bullseye
23:53 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1258.eqiad.wmnet with OS bullseye
23:53 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1257.eqiad.wmnet with OS bullseye
23:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1254.eqiad.wmnet with OS bullseye
23:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1253.eqiad.wmnet with OS bullseye
23:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1252.eqiad.wmnet with OS bullseye
23:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1251.eqiad.wmnet with OS bullseye
23:51 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1250.eqiad.wmnet with OS bullseye
23:37 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:36 zabe@deploy1003: Finished scap: Backport for Automatically set db section to s5 for new wiki (duration: 07m 20s)
23:34 jclark@cumin1002: START - Cookbook sre.dns.netbox
23:31 zabe@deploy1003: zabe: Continuing with sync
23:31 zabe@deploy1003: zabe: Backport for Automatically set db section to s5 for new wiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:28 zabe@deploy1003: Started scap sync-world: Backport for Automatically set db section to s5 for new wiki
22:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T367856)', diff saved to https://phabricator.wikimedia.org/P67198 and previous config saved to /var/cache/conftool/dbconfig/20240801-223711-marostegui.json
22:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P67197 and previous config saved to /var/cache/conftool/dbconfig/20240801-222204-marostegui.json
22:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P67196 and previous config saved to /var/cache/conftool/dbconfig/20240801-220657-marostegui.json
21:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T367856)', diff saved to https://phabricator.wikimedia.org/P67195 and previous config saved to /var/cache/conftool/dbconfig/20240801-215150-marostegui.json
20:40 thcipriani: utc late window complete
20:28 thcipriani@deploy1003: Finished scap: Backport for revisionCheck: skip null wikiPages (T371348) (duration: 09m 19s)
20:23 thcipriani@deploy1003: thcipriani, jsn: Continuing with sync
20:20 thcipriani@deploy1003: thcipriani, jsn: Backport for revisionCheck: skip null wikiPages (T371348) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:18 thcipriani@deploy1003: Started scap sync-world: Backport for revisionCheck: skip null wikiPages (T371348)
20:01 dwisehaupt@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:01 dwisehaupt@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: decomission of frdb2002, payments2001, and payments2002 - dwisehaupt@cumin1002"
20:01 dwisehaupt@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: decomission of frdb2002, payments2001, and payments2002 - dwisehaupt@cumin1002"
19:56 dwisehaupt@cumin1002: START - Cookbook sre.dns.netbox
19:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host alert2002.wikimedia.org with OS bookworm
19:15 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host alert2002.wikimedia.org with OS bookworm
18:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=(cdn|ats-be)
18:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host alert2002.wikimedia.org with OS bookworm
18:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on alert2002.wikimedia.org with reason: host reimage
18:29 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on alert2002.wikimedia.org with reason: host reimage
18:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host alert2002.wikimedia.org with OS bookworm
18:10 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.16 refs T366961
18:00 brennen: 1.43.0-wmf.16 train (T366961): no current blockers, logs cluttered but not too scary, rolling to all wikis.
17:58 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
17:58 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
17:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus2007.codfw.wmnet with OS bookworm
17:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
17:27 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
17:24 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2007.codfw.wmnet with OS bookworm
17:21 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
17:19 cdanis@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
16:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus2008.codfw.wmnet with OS bookworm
16:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus2007.codfw.wmnet with OS bookworm
16:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus2008.codfw.wmnet with reason: host reimage
16:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
16:27 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus2008.codfw.wmnet with reason: host reimage
16:25 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
16:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2008.codfw.wmnet with OS bookworm
16:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2007.codfw.wmnet with OS bookworm
16:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus2007.codfw.wmnet with OS bookworm
16:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus2008.codfw.wmnet with OS bookworm
15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
15:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus2008.codfw.wmnet with reason: host reimage
15:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus2007.codfw.wmnet with reason: host reimage
15:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
15:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
15:47 volans: installing spicerack v8.10.0 to cumin1002
15:47 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1041.mgmt.eqiad.wmnet with reboot policy GRACEFUL
15:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
15:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus2008.codfw.wmnet with reason: host reimage
15:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
15:45 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
15:44 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
15:43 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
15:34 jgiannelos@deploy1003: Finished deploy [restbase/deploy@f696b76]: (no justification provided) (duration: 17m 07s)
15:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1041.mgmt.eqiad.wmnet with reboot policy GRACEFUL
15:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2008.codfw.wmnet with OS bookworm
15:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2007.codfw.wmnet with OS bookworm
15:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['prometheus2008']
15:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['prometheus2008']
15:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host prometheus2008.mgmt.codfw.wmnet with reboot policy FORCED
15:23 volans: installing spicerack v8.10.0 to cumin2002
15:17 jgiannelos@deploy1003: Started deploy [restbase/deploy@f696b76]: (no justification provided)
15:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['prometheus2007']
15:13 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['prometheus2007']
15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host prometheus2007.mgmt.codfw.wmnet with reboot policy FORCED
15:04 elukey: rollback debmonitor-server to 0.4.0-3 on debmonitor2003
15:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host prometheus2008.mgmt.codfw.wmnet with reboot policy FORCED
15:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host prometheus2007.mgmt.codfw.wmnet with reboot policy FORCED
15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding prometheus2007 to codfw - jhancock@cumin2002"
15:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding prometheus2007 to codfw - jhancock@cumin2002"
14:59 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1040.mgmt.eqiad.wmnet with reboot policy GRACEFUL
14:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host kubestage1003.eqiad.wmnet
14:54 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host kubestage1003.eqiad.wmnet
14:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) for host kubestage1003.eqiad.wmnet
14:53 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node for host kubestage1003.eqiad.wmnet
14:49 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1040.mgmt.eqiad.wmnet with reboot policy GRACEFUL
14:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:46 zabe@deploy1003: Finished scap: Backport for Move section mapping to separate file (duration: 08m 06s)
14:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1039.mgmt.eqiad.wmnet with reboot policy GRACEFUL
14:41 zabe@deploy1003: zabe: Continuing with sync
14:40 zabe@deploy1003: zabe: Backport for Move section mapping to separate file synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:38 zabe@deploy1003: Started scap sync-world: Backport for Move section mapping to separate file
14:34 elukey: uploaded spicerack_8.10.0 to apt.wikimedia.org bullseye-wikimedia
14:31 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
14:31 fabfur: repool cp4037 (T370741)
14:28 elukey: upgrade debmonitor-server on debmonitor2003 to 0.5.0
14:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host alert2002.wikimedia.org with OS bookworm
14:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host alert2002.wikimedia.org with OS bookworm
14:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host alert2002.wikimedia.org with OS bookworm
14:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host alert2002.wikimedia.org with OS bookworm
14:16 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1039.mgmt.eqiad.wmnet with reboot policy GRACEFUL
14:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy GRACEFUL
14:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host alert2002.wikimedia.org with OS bookworm
14:05 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy GRACEFUL
14:01 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1037.mgmt.eqiad.wmnet with reboot policy GRACEFUL
13:52 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1037.mgmt.eqiad.wmnet with reboot policy GRACEFUL
13:49 cgoubert@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) for host kubestage1003.eqiad.wmnet
13:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node for host kubestage1003.eqiad.wmnet
13:46 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy GRACEFUL
13:44 cdanis@deploy1003: Finished scap: Backport for Increase IP cap limit for azwiki (T371439) (duration: 07m 28s)
13:40 cdanis@deploy1003: cdanis, nmw03: Continuing with sync
13:40 cdanis@deploy1003: cdanis, nmw03: Backport for Increase IP cap limit for azwiki (T371439) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy GRACEFUL
13:37 cdanis@deploy1003: Started scap sync-world: Backport for Increase IP cap limit for azwiki (T371439)
13:19 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
13:18 fabfur: depool cp4037 to test remove benthos package / conffiles (T370741)
13:09 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns4003.wikimedia.org,service=recdns [reason: [done] pdns-rec upgrade]
13:06 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns4003.wikimedia.org,service=recdns [reason: pdns-rec upgrade]
13:03 isaranto@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
13:00 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host gerrit2003.mgmt.codfw.wmnet with reboot policy GRACEFUL
13:00 urbanecm@deploy1003: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync
12:59 urbanecm@deploy1003: helmfile [staging] START helmfile.d/services/linkrecommendation: sync
12:59 urbanecm@deploy1003: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: sync
12:58 urbanecm@deploy1003: helmfile [codfw] START helmfile.d/services/linkrecommendation: sync
12:55 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: sync
12:55 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/linkrecommendation: sync
12:55 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
12:55 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
12:52 elukey@cumin1002: START - Cookbook sre.hosts.provision for host gerrit2003.mgmt.codfw.wmnet with reboot policy GRACEFUL
12:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on alert2002.wikimedia.org with reason: host reimage
12:39 urbanecm: Decommission Add Link models for akwiki, nawiki (T371598)
12:37 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on alert2002.wikimedia.org with reason: host reimage
12:26 isaranto@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
12:19 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=dewiki --olderThan=1721045915 --verbose # T371597
12:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host alert2002.wikimedia.org with OS bookworm
12:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['alert2002']
12:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['vrts2002']
12:10 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['alert2002']
12:10 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['vrts2002']
12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) for host kubestage1003.eqiad.wmnet
12:09 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node for host kubestage1003.eqiad.wmnet
12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) for host kubestage1003.eqiad.wmnet
12:06 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node for host kubestage1003.eqiad.wmnet
11:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
11:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host alert2002.mgmt.codfw.wmnet with reboot policy FORCED
11:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
11:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host alert2002.mgmt.codfw.wmnet with reboot policy FORCED
11:48 kevinbazira@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67192 and previous config saved to /var/cache/conftool/dbconfig/20240801-113108-root.json
11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67191 and previous config saved to /var/cache/conftool/dbconfig/20240801-111602-root.json
11:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67190 and previous config saved to /var/cache/conftool/dbconfig/20240801-110057-root.json
10:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67189 and previous config saved to /var/cache/conftool/dbconfig/20240801-104551-root.json
10:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67188 and previous config saved to /var/cache/conftool/dbconfig/20240801-103046-root.json
10:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67187 and previous config saved to /var/cache/conftool/dbconfig/20240801-101541-root.json
10:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67186 and previous config saved to /var/cache/conftool/dbconfig/20240801-100035-root.json
09:54 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1035.mgmt.eqiad.wmnet with reboot policy GRACEFUL
09:44 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1035.mgmt.eqiad.wmnet with reboot policy GRACEFUL
09:36 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephmon1006.mgmt.eqiad.wmnet with reboot policy GRACEFUL
09:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1233.eqiad.wmnet with reason: Maintenance
09:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1233.eqiad.wmnet with reason: Maintenance
09:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1233', diff saved to https://phabricator.wikimedia.org/P67185 and previous config saved to /var/cache/conftool/dbconfig/20240801-093123-marostegui.json
09:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephmon1006.mgmt.eqiad.wmnet with reboot policy GRACEFUL
09:24 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephmon1005.mgmt.eqiad.wmnet with reboot policy GRACEFUL
09:16 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephmon1005.mgmt.eqiad.wmnet with reboot policy GRACEFUL
09:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephmon1004.mgmt.eqiad.wmnet with reboot policy GRACEFUL
09:00 elukey@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephmon1004.mgmt.eqiad.wmnet with reboot policy GRACEFUL
08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2230.mgmt.codfw.wmnet with reboot policy GRACEFUL
08:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2230.mgmt.codfw.wmnet with reboot policy GRACEFUL
08:49 ayounsi@cumin1002: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
08:48 ayounsi@cumin1002: START - Cookbook sre.postgresql.postgres-init
08:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2229.mgmt.codfw.wmnet with reboot policy GRACEFUL
08:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T367856)', diff saved to https://phabricator.wikimedia.org/P67184 and previous config saved to /var/cache/conftool/dbconfig/20240801-084409-marostegui.json
08:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1186.eqiad.wmnet with reason: Maintenance
08:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1186.eqiad.wmnet with reason: Maintenance
08:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T367856)', diff saved to https://phabricator.wikimedia.org/P67183 and previous config saved to /var/cache/conftool/dbconfig/20240801-084347-marostegui.json
08:35 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2229.mgmt.codfw.wmnet with reboot policy GRACEFUL
08:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P67182 and previous config saved to /var/cache/conftool/dbconfig/20240801-082840-marostegui.json
08:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P67181 and previous config saved to /var/cache/conftool/dbconfig/20240801-081333-marostegui.json
08:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1246.eqiad.wmnet with reason: Maintenance
08:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1246.eqiad.wmnet with reason: Maintenance
08:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2148.codfw.wmnet with reason: Maintenance
08:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2148.codfw.wmnet with reason: Maintenance
08:04 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "netbox4 sync - ayounsi@cumin1002"
08:04 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "netbox4 sync - ayounsi@cumin1002"
07:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T367856)', diff saved to https://phabricator.wikimedia.org/P67180 and previous config saved to /var/cache/conftool/dbconfig/20240801-075826-marostegui.json
07:47 ayounsi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
07:47 ayounsi@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netbox 4 sync - ayounsi@cumin1002"
07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T367856)', diff saved to https://phabricator.wikimedia.org/P67179 and previous config saved to /var/cache/conftool/dbconfig/20240801-074507-marostegui.json
07:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
07:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
07:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T367856)', diff saved to https://phabricator.wikimedia.org/P67178 and previous config saved to /var/cache/conftool/dbconfig/20240801-074445-marostegui.json
07:43 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts deploy1002.eqiad.wmnet
07:43 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:41 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
07:39 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netbox 4 sync - ayounsi@cumin1002"
07:36 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
07:36 ayounsi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
07:32 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
07:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P67177 and previous config saved to /var/cache/conftool/dbconfig/20240801-072938-marostegui.json
07:21 akosiaris@cumin1002: START - Cookbook sre.hosts.decommission for hosts deploy1002.eqiad.wmnet
07:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P67176 and previous config saved to /var/cache/conftool/dbconfig/20240801-071431-marostegui.json
07:04 akosiaris: uncordon parse2001, parse1001 T359387
06:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T367856)', diff saved to https://phabricator.wikimedia.org/P67175 and previous config saved to /var/cache/conftool/dbconfig/20240801-065924-marostegui.json
06:48 ayounsi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
06:45 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
06:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:39 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
01:01 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:58 sukhe@cumin1002: START - Cookbook sre.dns.netbox
00:53 sukhe: run authdns-update

Other archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020-2024

2025-present