Search Platform/Weekly Updates/2023-05-18
Appearance
Summary
Working on the post-mortem of the WDQS outage, Search Update pipeline, and optimizing Wikibase index settings.
What we've accomplished
Search - Analysis
- Continuing data analysis for apostrohpe-like characters (T315118). There are 22 candidate characters, and they get treated differently by different tokenizers (the Hebrew tokenizer straight up converts 5 of them to apostrophes—including Hebrew geresh—which I never noticed before!) and by ICU normalization and ICU folding.
Operations / SRE
- Ironed out the last performance issues of the WDQS SLO dashbaord - https://phabricator.wikimedia.org/T313751
- Deploying new Turkish analyzer https://phabricator.wikimedia.org/T332355
- Deploy newer version of Flink to rdf-streaming-updater, this should eliminate the bug that caused last week's outage. https://phabricator.wikimedia.org/T334244
- Audit NIC firmware in preparation for Buster EOL https://phabricator.wikimedia.org/T331297