Jump to content

Search Platform/Weekly Updates/2024-09-20

From Wikitech

Summary

We closed a number of operational tasks and qulityof life improvements for our users this week.

Good progress was done on deploying ICU folding to most of the languages supported by Search.

What we've accomplished

WDQS graph splitting

Improve multilingual zero-results rate

General task: https://phabricator.wikimedia.org/T332342

  • I have finished Nepali (after deciding not to do anything special with the occasional Tibetan script), Assamese, and Punjabi. Only Oriya is left.. then a quick re-check that everything works as expected, a quick code review, and a new patch Friday or Monday.
  • ICU folding configs are done for Marathi, Burmese, Malayalam, Telugu, Sinhala, Kannada, & Gujarati.
  • I was able to do some additional needed normalization for Marathi, Malayalam, Sinhala, and Gujarati, which is a very nice bonus. I did some minor refactoring so all those share a `case` statement, too.
  • Nepali is in progress.. I think the configs for Devanagari are done, but I'm looking into the lesser-used Tibetan script (which occurs regularly on-wiki), and that config may be re-usable for the Tibetan langauge, too.
  • After that Assamese, Punjabi, and Oriya are left. Each language config takes between 20 minutes and 2 days—though I'm averaging ~2½ configs per day, and hoping to finish the configs for the INdic languages by the end of the week.

Search Metrics

Misc / Operations