Search Platform/Weekly Updates/2023-03-24
Appearance
Summary
The Spark 3 upgrade is close to be completed. We might have some overflow on Q4 to address the last few issues.
The focus is still on the Spark 3 upgrade over the Search Update Pipeline work. Progress on the Search Update Pipeline is minimal, but will pick up in Q4.
We're very close to have completed the unpacking of all Search Analyzers, with Brazilian and Estonian being the only analyzers left, and planned for Q4. This will enable us to implement improvements to Search uniformly across languages in the future.
What we've accomplished
Spark 3 Upgrade
- Long list of DAGs migrated to Spark 3 (see the subtasks of https://phabricator.wikimedia.org/T318414 for details).
Search Analyzers
- As part of the unpacking work, Romanian was improved by incorporating changes in how ș and ț are written. There is an interesting backstory (both on the ticket and on the write up on this work) about how technical limitations influence languages. This is also a good example of how small changes can have very positive impact on Search in specific languages (and how this aligns with aim to support underserved communities as well as mainstream languages). Note that a reindex is still needed before this work is available on wiki.
Search Update Pipeline
- Made progress on deploying the WDQS streaming updater with the k8s operator, the JobManager starts, failure is when contacting swift, most probably secrets are not being passed properly. Lot of back and forth to find the right configuration.
- Extracting differences between the MySQL replicas (copied monthly to HDFS) and the CirrusSearch indices to create an SLI on the health of the update pipeline. This already uncovers a number of issues, mostly with redirects of JS/CSS and some missing updates.
Operations / SRE
- Fixed WDQS Federation with AGROVOC - https://phabricator.wikimedia.org/T328625
- Added WDQS Federation with https://data.europa.eu/sparql - https://phabricator.wikimedia.org/T331271
- Cleanup /wmf/data/discovery/transfer_to_es folder in hdfs - https://phabricator.wikimedia.org/T323616
Misc
- New dashboard on Search Preview usage, showing that mobile use is roughly twice that of desktop usage, likely because desktop has hovercards.
- Cindy (automated end to end testing for CirrusSearch) is having issues again. Lack of support for testing on docker with multiple wikis is a recurring issue for our team.