Search Platform/Weekly Updates/2024-05-24
Appearance
Summary
100% of updates (excluding private wikis) are migrated to our new Search Update Pipeline for our codfw and cloudelastic clusters, 25% for the eqiad cluster. We expect to complete the migration next week.
The recent Signpost article (https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2024-05-16/Op-Ed) has generated some additional discussion around the graph split. We're following them and engaging with the discussion.
What we've accomplished
Search Update Pipeline
- All updates (except private wikis) for codfw and cloudelastic, 25% for eqiad, have been migrated to the new search update pipeline, we expect to shift the rest of the traffic next week - https://phabricator.wikimedia.org/T363475
- small patch to stop shipping cirrusLinksUpdate jobs when there are no writable clusters available, these jobs would do nothing, this should save some resources on jobrunners, kafka and changeprop - https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/1035429
- Fixed some wikidata items not being parsed correctly due to integer overflow - https://phabricator.wikimedia.org/T364837Cannot provide empty array to wikis as $wgCirrusSearchWriteClusters - https://phabricator.wikimedia.org/T365190
WDQS graph splitting
- Respond to comment on the refinement talk page - https://www.wikidata.org/wiki/Wikidata_talk:SPARQL_query_service/WDQS_graph_split/WDQS_Split_Refinement
- Deployed the airflow change to split the graph daily with spark
- data-reload cookbooks reviewed, we should be able to test them next week - https://phabricator.wikimedia.org/T349069
- Generalize ScholarlyArticleSplitter - https://phabricator.wikimedia.org/T362060
Search Metrics
- We have a superset dashboard for search metrics (https://superset.wikimedia.org/superset/dashboard/search/) - https://phabricator.wikimedia.org/T358345 (and a number of subtasks)
Misc
- Fixed failing tests on Glent with Java 11 - https://phabricator.wikimedia.org/T350974
- Fixed image suggestion pipeline that was failing on empty partitions - https://phabricator.wikimedia.org/T358472