Search Platform/Weekly Updates/2024-06-07
Appearance
Summary
We're working on communication following the feedback period for the graph split, and on finalizing the implementation.
All writes (except private wikis) switched to the new search update pipeline, leading to a significant decrease in PHP worker saturation.
What we've accomplished
WDQS graph splitting
- Update on-wiki pages related to the split:
- Added a note regarding the end of the feedback period https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/WDQS_Split_Refinement
- Created a simplified version of the definition of the split: https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Rules
- Prepared a test of the new wdqs.data-reload cookbook: https://phabricator.wikimedia.org/P64016
- Continued work on update pipeline - https://phabricator.wikimedia.org/T361935
Search Update Pipeline
- All writes (except private wikis) switched to the new pipeline - https://phabricator.wikimedia.org/T363475
- Significant decrease in PHP worker saturation related to this: https://grafana-rw.wikimedia.org/d/U7JT--knk/mediawiki-on-k8s?forceLogin=true&from=1716422400000&orgId=1&to=1717027200000&var-container_name=All&var-dc=eqiad+prometheus%2Fk8s&var-namespace=mw-jobrunner&var-release=main&var-service=mediawiki&var-site=&viewPanel=84
Search Metrics
- Automate search metrics notebooks and integrate with Airflow - https://phabricator.wikimedia.org/T364599
Misc
- Fixed a small problem in the drop_old_data_daily dag failing because I misconsfigured a table that do not have a wiki partition
- Reindexing across all three clusters is completed with the new orchestration. Based on this run refactored the orchestration to operate on a per-index basis instead of a per-wiki basis. - https://phabricator.wikimedia.org/T363734 This completes the work on various search harmonization efforts: