Search Platform/Weekly Updates/2023-09-15
Appearance
Summary
We now track all the indicators we need for Search SLO. Next step, we will create dashboards.
The work on Search Update Pipeline needs to address a few remaining features, in particular dealing with late events and deduplication of events is a bit more complex than expected. We're moving forward on testing the deployment.
Performance improvements on Improve multilingual zero-results rate is ongoing.
We've finally removed the use of "whitelist" in the Wikidata Query Service code base, a good step in using inclusive language in our work.
What we've accomplished
Search Update Pipeline (SUP)
- We're dropping support for Java 8 in the SUP. This will enable us to simplify the code a little bit (dropping compatibility libraries that we are currently using) and allow us to use newer features of Java. On the other hand, this will prevent us for deploying SUP on Yarn.
- We are being hit by a bug in Flink (https://issues.apache.org/jira/browse/FLINK-28758) that prevents us from properly testing the deployment of Flink with Zookeeper. This is raising the priority of migrating the WDQS Streaming Updater from older Kafka / Flink integration to newer APIs - https://phabricator.wikimedia.org/T326914
Improve multilingual zero-results rate
- Smarter handling of acronyms for word_break_helper in language analyzers. Some performance optimization still need to be addressed. Write up on https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Language_Analyzer_Harmonization_Notes#word_break_helper_and_Acronyms_(T170625) - https://phabricator.wikimedia.org/T170625
- Ongoing performance testing of the various harmonisations - https://phabricator.wikimedia.org/T346051
Search SLOs
- All defined SLI are tracked, we still need dashboards - https://phabricator.wikimedia.org/T335499
Operations
- Migrate search-loader hosts to Bullseye or later - https://phabricator.wikimedia.org/T346039
- Allow WDQS federation with data.nlg.gr. Note that federation does not actually work, since data.nlg.gr has specific HTTP header requirements that we are not going to support. If requirements are relaxed, federation should start working - https://phabricator.wikimedia.org/T337296
- Retune enwiki_content shard settings - https://phabricator.wikimedia.org/T343820
Misc
- Removed the usage of "whitelist" in Wikidata Query Service code, replacing it with "allowlist" - https://phabricator.wikimedia.org/T344284
- Partial fix to file uploads not being indexed properly. The root is a race condition in Mediawiki, which is identified and documented, but that isn't going to be fixed by our team (https://phabricator.wikimedia.org/T344285). Our partial fix reduces the frequency of this issue, but does not solve it entirely - https://phabricator.wikimedia.org/T342562