Search Platform/Weekly Updates/2025-02-14
Appearance
Ongoing work
MLR Improvements
- Deploy and test new MLR models (https://phabricator.wikimedia.org/T385972)
- We deployed the learning-to-rank (ml) models tested last quarter to all wikis with MLR enabled. We started a new A/B experiment to benchark those models again recently trained ones.
WDQS graph split
- Worked on a small patch to improve the parsing of RDF dumps in hadoop which might slighly change due to https://phabricator.wikimedia.org/T384344
WDQS Expose RDF stream publicly
- Started a documentation at https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater/Public_Update_Stream
Search Update Pipeline - Weighted Tags
Elasticsearch -> OpenSearch migration
- Upgraded all the plugins to opensearch 1.3.20, updated the cirrus dev image to that same version, switched cindy to 1.3.20 too (https://phabricator.wikimedia.org/T385005)
Misc / Operations
- Multiple instabilities with Mjolnir (Machine Learning training pipeline) - T383218 Mjolnir is sometimes stuck in feature selection:
- T383870 mjolnir should pin refinery jar version explicitly
- implemented the workarounds proposed in T383218, to mitigate impact on dags sharing the same airflow pool. There's an issue with this implementation though, that is leading tasks to execute out of order.
- bumped Spark Driver memory for the offending task, hopefully this will help. If not, the next step will be digging into the codepath executed by this task.
- Investigated categories query service lag issues, was due to a discrepancies in mariadb index names on some dbstore hosts, got fixed by Amir (https://phabricator.wikimedia.org/T386005)
What we've accomplished
Misc / Operations
- We have completed our migration from Graphite to Prometheus for all Search related metrics!