Search Platform/Weekly Updates/2023-06-23
Appearance
Summary
We've been dealing with more infrastructure complications this week. Search was broken over the weekend on Commons and Wikidata (https://wikitech.wikimedia.org/wiki/Incidents/2023-06-18_search_broken_on_wikidata_and_commons / https://phabricator.wikimedia.org/T339810). Missing canary events in EventGate created some additional issues in our downstream data pipelines that required manual intervention.
What we've accomplished
Improve multilngual zero-results rate
- Progress on smarter handling of acronyms - https://phabricator.wikimedia.org/T170625
- Progress on aggressive splitting - https://phabricator.wikimedia.org/T219108
Search Update Pipeline
- Progress on deduplication of the update stream and support for redirects -https://phabricator.wikimedia.org/T325672 / https://phabricator.wikimedia.org/T325315
WDQS Split the graph
- Started documentation of the data analysis needs - https://docs.google.com/document/d/1QsV96LtpK5lDD2N2jy-6vaF_0d_Yf_HLb8uFARFMxJ8/edit
Operations
- Canary events stopped being produced, causing downstream issues with most data pipelines. Manual steps were needed to recover. Route cause is tracked on the Data Engineering team (https://phabricator.wikimedia.org/T340166). Larger conversation is needed to make the system more robust and easier to recover from failures.
Misc
- Search was broken on Commons and Wikidata over the weekend - https://wikitech.wikimedia.org/wiki/Incidents/2023-06-18_search_broken_on_wikidata_and_commons