Search Platform/Weekly Updates/2023-08-04
Appearance
Summary
The team is focused on the Search Update Pipeline and Improvements to Multilingual Zero-Result Rate.
What we've accomplished
Search Update Pipeline
- Prep work and fixes required before implementing support for page re-render - https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/8
- Dashboard created to visualize inconsistencies between Mediawiki and search indices. It isn't clear yet how we can use this data as the basis for an SLO, we need more data points - https://superset.wikimedia.org/superset/dashboard/451/?native_filters_key=rY5v_sSw_7HBDwck-cepsTza3yzYSckh6HxJ7M6ZyrKcy9R5p3oW48lYXIb7MtKi
Improve multilingual zero-results rate
- Ongoing work on standardization of ASCII folding and ICU folding - https://phabricator.wikimedia.org/T332342
Create project plan for WDQS graph splitting
- Exciting early results from the WMDE analyst: https://phabricator.wikimedia.org/T342111 (TL/DR extracting scholarly article using a very simple rule is almost a 50-50 split with a very negligible number of overlapping tiples ~200k).
Misc
- Removing dependencies in CirrusSearch integration tests - https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/942620