Search Platform/Weekly Updates/2024-08-02
Appearance
This page is currently a draft.
More information and discussion about changes to this draft on the talk page.
More information and discussion about changes to this draft on the talk page.
Summary
Event Stream events have been introduced to make it possible to support private streams for the Search Update Pipeline ("SUP"). Next week or the week after we anticipate rollout of SUP for private wikis to begin, which will make their search updates happen more like the public wikis already do.
The search metrics dashboard now has a number of improvements:
- Fulltext search now includes zero-result session rates
- It's possible to filter based on project and wiki language
Final pieces are coming into place for the initial WDQS graph split in production. The initial import is anticipated to be ready sometime next week, with announcement to follow the week after that.
What we've accomplished
WDQS graph splitting
- Final pieces are being arranged for the initial deployment of the production graph split, with a data import set to run and then be verified next week. https://phabricator.wikimedia.org/T370754 and https://phabricator.wikimedia.org/T364367
- A communication for the initial availability of a production graph split was prepared, and is slated for the week after next.
Search Update Pipeline / Private Wikis
- Events have been configured for private wikis. There were some configuration changes that needed to be introduced to correct some system behavior upon deployment, and with those resolved rollout is targeted for next week or the one after that. https://phabricator.wikimedia.org/T346046
Improve multilingual zero-results rate
- Review of ASCII- vs. ICU-Folding configuration. https://phabricator.wikimedia.org/T332342
- Review of pre- and post-harmonization work. https://phabricator.wikimedia.org/T219550
- Discussion on next targets for multilingual zero-results rate stream of work after ASCII-folding/ICU-folding. There are some opportunities for some language-specific improvements as well as some opportunities for addressing "wrong keyboard" issues in a generalized way.
- Split abandonment into with-results and zero-result sessions in search metrics dashboard. https://phabricator.wikimedia.org/T370290
Search backend replacement
- The search backend replacement decision record has been reviewed in Search Platform and with its management, and next week will be noted in the cross-org SRE meeting for any additional final review. https://www.mediawiki.org/wiki/Wikimedia_Search_Platform/Decision_Records/Search_backend_replacement_technology
FY 24-25 WE.3.1: Browsing & learning experiences
- Connections between software engineering, data analysis, and product management have been established for the Web team's upcoming work on recommendations and retention. Team members met to discuss initial considerations for the phases of development and measurement considerations and they will meet again post-Wikimania.
Vector based search
- We've been discussing ideas for technical experiments for vector based search, which we would want to conduct on OpenSearch. Additional discussions about vector embeddings are occurring in parallel in various workstreams (one of them is WE.3.1, but it is early still); some vector based searches may be suitable to lower volume on-demand calls, but some of them may also be appropriate for OpenSearch serving.
Misc
- A pool counter for GeoData was introduced to address issues with search pool saturation due to a a user agent with unexpected behavior. https://phabricator.wikimedia.org/T370621
- Investigated and determined source of lower-than-expected cache hit rate for hitting the parser cache - it's not 0.05% but rather 20%, and the really low figure had to do with the configuration of the dashboard. https://phabricator.wikimedia.org/T370796