Jump to content

Search Platform/Weekly Updates/2024-08-02

From Wikitech

Summary

Event Stream events have been introduced to make it possible to support private streams for the Search Update Pipeline ("SUP"). Next week or the week after we anticipate rollout of SUP for private wikis to begin, which will make their search updates happen more like the public wikis already do.

The search metrics dashboard now has a number of improvements:

  • Fulltext search now includes zero-result session rates
  • It's possible to filter based on project and wiki language

Final pieces are coming into place for the initial WDQS graph split in production. The initial import is anticipated to be ready sometime next week, with announcement to follow the week after that.

What we've accomplished

WDQS graph splitting

Search Update Pipeline / Private Wikis

  • Events have been configured for private wikis. There were some configuration changes that needed to be introduced to correct some system behavior upon deployment, and with those resolved rollout is targeted for next week or the one after that. https://phabricator.wikimedia.org/T346046

Improve multilingual zero-results rate

  • Review of ASCII- vs. ICU-Folding configuration. https://phabricator.wikimedia.org/T332342
  • Review of pre- and post-harmonization work. https://phabricator.wikimedia.org/T219550
  • Discussion on next targets for multilingual zero-results rate stream of work after ASCII-folding/ICU-folding. There are some opportunities for some language-specific improvements as well as some opportunities for addressing "wrong keyboard" issues in a generalized way.
  • Split abandonment into with-results and zero-result sessions in search metrics dashboard. https://phabricator.wikimedia.org/T370290

Search backend replacement

FY 24-25 WE.3.1: Browsing & learning experiences

  • Connections between software engineering, data analysis, and product management have been established for the Web team's upcoming work on recommendations and retention. Team members met to discuss initial considerations for the phases of development and measurement considerations and they will meet again post-Wikimania.
  • We've been discussing ideas for technical experiments for vector based search, which we would want to conduct on OpenSearch. Additional discussions about vector embeddings are occurring in parallel in various workstreams (one of them is WE.3.1, but it is early still); some vector based searches may be suitable to lower volume on-demand calls, but some of them may also be appropriate for OpenSearch serving.

Misc

  • A pool counter for GeoData was introduced to address issues with search pool saturation due to a a user agent with unexpected behavior. https://phabricator.wikimedia.org/T370621
  • Investigated and determined source of lower-than-expected cache hit rate for hitting the parser cache - it's not 0.05% but rather 20%, and the really low figure had to do with the configuration of the dashboard. https://phabricator.wikimedia.org/T370796