Search Platform/Weekly Updates/2024-03-22
Appearance
Summary
This was a slow week, with most of the team traveling and recovering after our in person offsite last week.
The full deployment of the Search Update Pipeline work is almost certainly going to spill over to next quarter. We don't expect further surprises, the deployment will roll out progressively to more wikis.
What we've accomplished
Improve multilingual zero-results rate
- The dotted I fix code is up and ready for review, but Gerrit forgot to tag any reviewers. (It's just random enough about it that I haven't learned to check every time!) The write up is also done (https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Language_Analyzer_Harmonization_Notes#Generally_Enable_dotted_i_fix_(T358495)). There was a fair amount of refactoring, but the core of it was straightforward. - https://phabricator.wikimedia.org/T358495
- Reindexing completed on the main search clusters, which enables a number of apostrophe normalization, camelCase handling, acronym handling, word_break_helper, and icu_tokenizer/_repair) - https://phabricator.wikimedia.org/T342444
WDQS graph splitting
- Documented limits of federation: https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federation_Limits
- Further performance analysis of various hardware options - https://phabricator.wikimedia.org/T359062