Search Platform/Weekly Updates/2024-04-19
Appearance
Summary
We had an incident on WDQS in the codfw datacenter, which required manual investigation and intervention, taking up significant time this week. The user impact was mitigated by depooling this data center and serving all traffic from the eqiad datacenter.
A few highlights for the week:
- Communication about the graph split was sent, we're waiting for feedback until May 15, after which we will freeze the graph split strategy
- We have a few stability fixes to the Search Update Pipeline. It has been deployed on all wikis on Cloudelastic and we will shortly start enabling it for the production search clsuter.
What we've accomplished
Improve multilingual zero-results rate
- After investigation, we will not enable hiragana/katakana mapping on all wikis as planned, but instead disable it on all wikis in the spirit of language harmonization (and because it does not seem to bring much value). Full write up on https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Language_Analyzer_Harmonization_Notes#Enab..._no,_wait.._Disable_Hiragana-to-Katakana_Mapping_(T180387) - https://phabricator.wikimedia.org/T180387
- Starting work on Yiddish ligatures that was reported recently by our users. This is similar to an issue with Arabic normalization that was reported almost 10 years ago, which we will address at the same time https://phabricator.wikimedia.org/T362501 / https://phabricator.wikimedia.org/T72899
- Reindex all wikis to enable apostrophe normalization, camelCase handling, acronym handling, word_break_helper, and icu_tokenizer/_repair - https://phabricator.wikimedia.org/T342444
WDQS Graph Split
- Communication sent: https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_backend_update/April_2024_scaling_update
- Federation limits are documented: https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federation_Limits
- Example queries with federation are published: https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples
Search Update Pipeline
- Deploy streaming updater for 100% of writes to cloudelastic - https://phabricator.wikimedia.org/T358518
- Improvement to stability: Streaming Updater should still make forward progress when one index has problems - https://phabricator.wikimedia.org/T356933
- Improvement to stability: SUP: Allow precise control over connection pool size AND operator capacity - https://phabricator.wikimedia.org/T361900
- Saneitizer has been integrated and deployed as part of the new update pipeline. This is helping discover a few more issues with pages with error state - https://phabricator.wikimedia.org/T358599
Operations
- Incident on WDQS@codfw where the streaming updater failed and reverted to an expensive reconciliation process for all changes, reducing the overall throughput enough that the updater could not catch up with the edit rate. We don't have a full understanding of the root cause, but we do have a few improvements to the general stability of the system - https://phabricator.wikimedia.org/T362508