Search Platform/Weekly Updates/2025-02-21
Appearance
Ongoing work
Article country model (WE2.5.1)
- Started to backfil articlecountry weighted_tags from stat1009 (https://phabricator.wikimedia.org/T385970)
- stopped for while because it caused lag issues
- resumed with a lower rate (20/sec instead)
- Optimize the PageAssessment extension to reduce the number of tags sent to help with the backfill of articlecountry
- Investigated why the SUP is being close to its max throughput, file https://phabricator.wikimedia.org/T386935 to continue the investigation
- Starting work on T386068 Implement articlecountry a new CirrusSearch keyword
Language Stuff: Kuromoji/Sudachi
- Finally finished the Kuromoji/Sudachi analysis and configuration updates! They are currently in review. We decided in the Wednesday Meeting to delay deployment of the plugin (and Japanese reindexing and MLR work) until after the OpenSearch migration is complete. [TJ]
- I decided to abandon the plan to work with custom dictionaries after finding a sufficiently robust approach to clean up almost all of the problems with Sudachi in the analysis config. This is less brittle and less of a maintenance burden.
- We will try to port Sudachi to OpenSearch 1.x when we get there, but if it doesn't work out (for technical or scheduling reasons) we can fall back to Kuromoji and its custom config until we eventually upgrade to OpenSearch 2.x.
- I also opened an upstream ticket to improve the Sudachi dictionary: https://github.com/WorksApplications/SudachiDict/issues/48
Misc / Operations
- T386638 I can't authenticaticate in Wikimedia Commons Query Service
- T383571 Mjolnir failures in feature collection task
- Mjolnir is sometimes stuck in feature selection (https://phabricator.wikimedia.org/T383218).
- After the latest patches (driver memory) bump, mjolnir seems more stable and the task was succesfull in two training runs.
- Made a stable branch of the cirrus-reindex-orchestrator and added a note about what version to use while mwscript-k8s gets fixed (https://gitlab.wikimedia.org/repos/search-platform/cirrus-reindex-orchestrator)